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ABSTRACT 



A system and method for automatically generating person- 
alized user profiles and for utilizing the generated profiles to 
perform adaptive Internet or computer data searches is 
provided. In accordance with the present invention, particu- 
lar linguistic patterns and their frequency of recurrence are 
extracted from personal texts provided by the users of the 
system of the present invention and stored in a user profile 
data file such that the user profile data file is representative 
of the user's overall linguistic patterns and the frequencies 
of recurrence thereof. All documents in a remote computer 
system, such as the Internet, are likewise analyzed and their 
linguistic patterns and pattern frequencies are also extracted 
and stored in corresponding document profiles. When a 
search for particular data is initiated by the user, linguistic 
patterns are also extracted from a search string provided by 
the user into a search profile. The user profile is then cross 
matched with the search profile and the document profiles to 
determine whether any linguistic patterns match in all three 
profiles and to determine the magnitude of the match based 
on summation of respective frequencies of recurrence of the 
matching patterns. The documents with document profiles 
having the highest matching magnitudes are presented to the 
user as not only matching the subject of the search string, but 
also as corresponding to the user's cultural, educational, and 
social backgrounds as well as the user's psychological 
profile. 

62 Claims, 8 Drawing Sheets 
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SYSTEM AND METHOD FOR GENERATING 
PERSONALIZED USER PROFILES AND FOR 
UTILIZING THE GENERATED USER 
PROFILES TO PERFORM ADAPTIVE 
INTERNET SEARCHES 

RELATED APPLICATIONS 

This application claims priority from U.S. Provisional 
Patent Application Ser. No. 60/116,582, entitled "Internet 
Search Vehicles" which was filed on Jan. 20, 1999. 

FIELD OF THE INVENTION 

The present invention relates generally to the computer 
data searches and more particularly to a system and method 
for automatically generating personalized user profiles and 
for utilizing the generated profiles to perform adaptive 
Internet or computer data searches. 

BACKGROUND OF THE INVENTION 

In recent years, computers have taken the world by storm. 
Today, most businesses entirely rely on computers to con- 
duct daily operations. In the academic world, computers 
have become essential tools for learning, teaching and 
research. In homes, computers are used to perform daily 
tasks ranging from paying bills to playing games. The one 
unifying requirement for all computer applications is the 
ability of a user to utilize a computer to locate particular 
information or data desired by the user. 

During the past few years, the quantity and diversity of 
information and services available over the public (e.g. 
Internet) and private (e.g. Intranet) local and wide area 
networks has grown substantially. In particular, the variety 
of information accessible through Internet-based services is 
growing rapidly both in terms of scope and depth. In simple 
terms, the Internet is a massive collection of individual 
computer networks operated by government, industry, 
academia, and private parties that are linked together to 
exchange information. While originally, the Internet was 
used mostly by scientists, the advent of the World Wide Web 
has brought the Internet into mainstream use. The World 
Wide Web (hereafter "WWW") is an international, virtual- 
network-based information service composed of Internet 
host computers that provide on-line information in a specific 
hypertext format. WWW host servers provide hypertext 
metalanguage (HTML) formatted documents using a hyper- 
text transfer protocol (HHP). Information on the WWW is 
accessed with a hypertext browser, such as the Netscape 
navigator or Microsoft Explorer. Web sites are collections of 
interconnected WWW documents. 

Typically, users communicate with the Internet through a 
communication gateway that may be implemented and con- 
trolled by an Internet service provider (i.e. an ISP) — a 
company that offers a user access to the Internet and the 
WWW through a software application that controls com- 
munication between the user's computer and the communi- 
cation gateway. The role of the ISP may also be taken 
directly by a particular organization that allows internet 
access to its employees or members. The user can access and 
navigate the WWW using a hypertext browser application 
residing on, and executed by, the user's computer. 

No hierarchy exists in the WWW, and the same informa- 
tion may be found by many different approaches. Hypertext 
links in WWW HTML documents allow readers to move 
from one place in a document to another (or even between 
documents) as they want to. One of the advantages of 
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WWW, is that there is no predetermined order that must be 
followed in navigating through various WWW documents. 
Readers can explore new sources of information by follow- 
ing links from place to place. Following links has been made 

S as easy as eliciting a mouse button on the link related to the 
subject a user wants to access. Each WWW document also 
has a unique uniform resource locator ("URL") that serves 
as an "address" that, when followed, leads the user to the 
document or file location on the WWW. Using the browser, 

10 the user can also mark and store "favorites" — URLs of 
particular WWW documents that interest the user such that 
the user can quickly and easily return to these documents in 
the future by selecting them from the favorites list in the 
browser. 

15 Because of the vastness of the Internet and the WWW, 
locating specific information desired by the user can be very 
difficult. To facilitate search for information a number of 
"search engines" have been developed and implemented. A 
search engine is a software application that searches the 

20 Internet for web sites containing information on the subject 
in which the user is interested. These searches are accom- 
plished in a variety of ways — all well-known in the art. 
Typically, a user first inputs a "search string" to the hypertext 
browser containing key words representative of the infor- 

25 mation desired by the user. The search engine then applies 
the search string to a previously constructed index of a 
multitude of web sites to locate a certain number of web sites 
having content that matches the user's search string. 
The located web site URLs are then presented to the user 

30 in the order of relevance to the key words in the user's search 
string. For example, a user providing the key word PLANT 
would obtain an exhaustive list of all registered sites that 
refer to plants. This list, however would be so large that the 
user would want to limit this search. Depending on the 

35 search engine used, the user could limit the search by 
entering a combination of key words such as the following: 
PLANT AND FLOWER AND GARDEN. This would limit 
the search to only Internet sites that contain all three words. 
In addition, users could further limit the search by entering 

40 PLANT AND FLOWER AND GARDEN NOT TREE NOT 
ORCHID. The results from this search would be further 
limited to exclude sites in which trees and orchids are listed 
as keywords. 

45 A number of approaches have been developed to improve 
the performance and accuracy of typical key word searches. 
For example, U.S. Pat. No. 5,845,278, issued to Kirsch, et. 
al, teaches approaches to establishing a quantitative basis for 
selecting client database sets (i.e. Internet documents or web 

5Q sites) that include the use of comprehensive indexing 
strategies, ranking systems based on training queries, expert 
systems using rule-based deduction methodologies, and 
inference networks. These approaches were used to examine 
knowledge base descriptions of client document collections 

55 or databases. 

However, the key word searching approaches utilized by 
previously known search engines suffer from a number of 
significant disadvantages. Most search systems are viewed 
as often ineffective in identifying the likely most relevant 

60 documents. Accordingly, the users are often presented with 
overwhelming amounts of information in response to their 
key words. Thus, using proper key word searching tech- 
niques becomes an art in itself — an art that is outside the 
capabilities of most Internet users. 

65 Most importantly, typical key word and even more 
advanced searches only provide the user with search results 
that depend entirely on the search string entered by the user, 
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without any regard to the user's cultural, educational, social 
backgrounds or the user's psychological profiles. The results 
returned by the search engines are tailored only to the search 
string provided by the user and not to the user's background. 
None of the previously known search engines tailor results 5 
of user's searches based on his or her background and 
unexpressed interests. For example, a twelve year old child 
using key word searches on the Internet for some informa- 
tion on computers may be presented with a multitude of 
documents that are far above the child's reading and edu- 10 
cational level. In another example, a physician searching the 
Internet for information on a particular disease may be 
presented with dozens of web sites that contain very generic 
information, while the physician's "unexpressed" interest 
was to find web sites about the disease that are on his 15 
educational and professional level. 

It would thus be desirable to provide a system and method 
for extracting and using linguistic patterns of textual data to 
assist a user in locating requested data that, in addition to 
matching the user's specific request, also corresponds to the 
user's professional, cultural, educational, and social back- 
grounds as well as to the user's psychological profile and 
thus addresses the user's Unexpressed" requests. 

SUMMARY OF THE INVENTION 

This invention relates to use of linguistic patterns of 
documents to assist a user in locating requested data that, in 
addition to matching the user's specific request, also corre- 
sponds to the user's cultural, educational, professional, and 
social backgrounds as well as to the user's psychological 
profile, and thus addresses the user's "unexpressed" 
requests. The present invention provides a system and 
method for automatically generating a personalized user 
profile based on linguistic patterns of documents provided 
by the user and for utilizing the generated profile to perform 
adaptive Internet or computer data searches. 

The system of the present invention advantageously over- 
comes the drawbacks of previously known data searching 
techniques. As was noted earlier, typical key word and even 
more advanced searches only provide the user with search 
results that depend entirely on the search string entered by 
the user, without any regard to the user's cultural, 
educational, professional, and social backgrounds or the to 
user's psychological profile. 

All texts composed by the user, or adopted by the user as 
favorite or inimical (such as a favorite book or short story), 
contain certain recurring linguistic patterns, or combinations 
of various parts of speech (nouns, verbs, adjectives, etc.) in 
sentences that reflect the user's cultural, educational, social 
backgrounds and the user's psychological profile. Research 
has shown that most people have readily identifiable lin- 
guistic patterns in their expression and that people with 
similar cultural, educational, and social backgrounds will 
have similar linguistic patterns. Furthermore, research has 55 
shown that such factors as psychological profile, life 
experience, profession, socioeconomic status, educational 
background, etc. contribute to determining the frequency of 
occurrences of particular linguistic patterns within the user's 
written expression. 60 

In accordance with the present invention, particular lin- 
guistic patterns and their frequencies of occurrence are 
extracted from the texts provided by a user of the system of 
the present invention and stored in a user profile data file. 
The user profile data file is thus representative of the user's 65 
overall linguistic patterns and their respective frequencies. 
All documents in a remote computer system, such as the 
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Internet, are likewise analyzed and their linguistic patterns 
and frequencies thereof also extracted and stored in corre- 
sponding document profiles. When a search for particular 
data is initiated by the user, linguistic patterns are also 
extracted from a search string provided by the user into a 
search profile. The user profile is then cross matched with 
the search profile and the document profiles to determine 
whether any Unguis tic patterns match in all three profiles 
and to determine the magnitude of the match based on 
summation of relative frequencies of matching patterns in 
the user profile and the document profile. The documents 
with document profiles having the highest matching mag- 
nitudes are presented to the user as not only matching the 
subject of the search string, but also as corresponding to the 
user's cultural, educational, and social backgrounds as well 
as the user's psychological profile. Thus, a world renowned 
physicist searching for information on quasars would be 
presented with very sophisticated physics documents that 
are oriented to wards his level of expertise. 

It should be noted that the user's background and psy- 
chological characteristics are not evident directly from the 
linguistic patterns themselves or form their frequencies. 
Accordingly, the system of the present invention matches the 
user's linguistic patterns to the linguistic patterns of data 
requested by the user without extracting any actual infor- 
mation about the user's background and psychological char- 
acteristics from the user profile. Thus, the user's privacy is 
not impinged by the creation and retention of the user 
profile. 

The profiling/search system includes a local computer 
system, connected to a remote computer network (e.g. the 
Internet) via a telecommunication link. The local computer 
system includes a control unit and related circuitry for 
controlling the operation of the local computer system and 
for executing application programs, a memory for tempo- 
rarily storing control program instructions and variables 
during the execution of application programs by the control 
unit; a storage memory for long term storage of data and 
application programs; and input devices for accepting input 
from the user. The local computer system further includes: 
output devices for providing output data to the user and a 
communication device for transmitting to, and receiving 
data from, the remote computer system via the telecommu- 
nication link. The remote computer system includes a com- 
munication gateway connected to the telecommunication 
link, a remote data storage system for long term data storage, 
and a remote computer system control unit (hereinafter RCS 
control unit). 

In summary, the system of the present invention operates 
in three separate independent stages, each stage being con- 
trolled by a particular control program executed by one of 
the local computer system and the remote computer system. 
In a first stage, a user profiling control program is executed 
to generate or update a user profile computer file represen- 
tative of the user's linguistic patterns and the frequencies 
with which these patterns recur in texts submitted by the user 
and/or automatically acquired by the inventive system. The 
user is then invited to provide textual data composed by the 
user such as e-mail messages, memorandums, essays as well 
as documents composed by others that the user has adopted 
as "favorites", such as favorite web sites, short stories, etc. 
These textual documents are temporarily stored in a user 
data file. The inventive system also monitors the user's data 
searching and data browsing (e.g. Internet browsing) to 
automatically add additional textual information to the user 
data file. Once the user data file attains a sufficient size, or 
when other criteria for updating the user profile are met, the 
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system executes a profile extraction subroutine to create/ The system then compares, for each data item profile, the 

update the user profile by extracting linguistic patters from segments stored in the data item profile with the segments 

the user data file. stored in the search profile to determine a number of matches 

During the profile extraction subroutine, the system between various segments in each of the profiles and then, 

retrieves individual textual documents from the user data s for each matching segment records the frequency with 

file, and separates each document into sentences. The system wn ich the matching segment recurs within the data item 

then extracts a linguistic pattern, or a segment, from each profile. A match value is then determined by the system for 

sentence characterized by first identifying words in the each segment in the data item profile that also appears in the 

sentence as being particular parts of speech (i.e. nouns, search rofile and in the user profilej by adding the fre . 

verbs, adjectives etc.), and then selecting a predetermined ]Q of ^ ^cni's occurrence in the data item profile 

combination of the identified parts of speech and ^stonngthis tQ m / fr of me segmen t's occurrence in the user 

combination as a segment In a preferred embodiment of the Finally, tfie system computes a final value for each 

present invention, each segment comprises a triad of three K. 4 . t / , 4 * tU . , , r tl 

parts of speech: noun-verb -adjective Ths segment extrac- dat f P rofile b / adding together the match va ues of all 

lion process is repeated for all textual documents in the user matclun S * f each data f m ' 1 The ^ value 1& c 

data file. The system then groups identical segments together 15 representative of the degree to which the linguistic pattern of 

and determines their frequency of occurrence in the user data item matches the bnguistic pattern of the user in 

profile. Thus, the resulting user profile contains the linguistic kgfrt of the linguistic pattern and subject matter of the search 

patterns from all texts submitted by the user (or automati- string. The data items, corresponding to data item profiles 

cally gathered by the system) and the frequencies with having the highest final values, are then retrieved by the 

which those patterns recur within the texts. 20 system. The system then presents the user with several data 

In a second stage of the present invention, a data profiling items having the highest final values, starting with the data 

control program is executed to generate data item profile item with the highest final value. 

computer files, representative of linguistic patterns and their other objects and features of the present invention will 

respective frequencies, of all data items. The data items may become apparent from the following detailed description 

include documents, web sites, and other textual data that 25 considered in conjunction with the accompanying drawings, 

may be subjected to a search by the user. A list of all data It ^ t0 be understoo d, however, that the drawings are 

items and their -respective data addresses (such as Internet designed sole l y f or purposes of illustration and not as a 

URL addresses) is first provided to the system^ The data item definition of the lknits of the inve ntion, for which reference 

profile generation procedure is then performed for each data should be made tQ the appended claims 

item in the list in a similar manner to the user-profiling 30 

procedure, except that data item address information is BRIEF DESCRIPTION OF THE DRAWINGS 
stored in each data item's profile. Thus, the resulting data 

item profile of each data item contain the data item address, In the drawings, wherein like reference characters denote 
the linguistic patterns of the data item and the frequencies elements throughout the several views: 
with which those patterns recur therein. 35 FIG. 1 is a schematic block diagram of a profiling/search 
In a third stage of the present invention, the system system for automatically generating personalized user pro- 
executes a data searching program that enables a user to files and for utilizing the generated profiles to perform 
utilize the system to perform advanced searches for desired adaptive Internet or computer data searches; 
data files, such that the data files returned as search results pjQ 2 is a logic flow diagram representative of a user 
correspond to the user's social, educational, and cultural 40 profiling control program executed by the profiling/search 
backgrounds and to the user's psychological profile. The system of FIG. 1 in accordance with the present invention; 
search program is initiated when the user provides a search FIGS 3 tQ 4 are ^ fiQW ^ s esentative of a 
string representative of data requested by the user to the flle ^ subroutine program executed by me 
system. The system then creates a search profile represen- proflling/search sy5tem of FIG . x m accordance with the 
tative of linguistic patterns in the search string in a similar 45 presen t invention* 

manner to the user-profiling procedure, except that frequen- , ' , • r. • 

cies of recurring segments are not recorded in the search A FIGS fi f to 6 ar f lo § lc flow representative of a 

profile. Optionally, the system expands the search profile by data P rofl c le ^ nt f} P ro S ram executed b ? me P^ng/search 

generating additional segments that contain synonyms of the s ^ tem of FIG ' 1 in accordance with the present invention; 

parts of speech in the existing segments already in the search 50 an 

profile, and storing the additional segments therein. FI GS. 7 to 8 are logic flow diagrams representative of a 

After the search profile is complete, the system retrieves data searching program executed by the profiling/search 

the user profile of the user performing the search and svslem of nG - 1 in accordance with the present invention, 

compares the segments stored in the user profile with the nFTAFT FH nFSPRlPTTOM OF PRFFFRRFD 

segments stored in the search profile to determine a number 55 DETAILED D |^™mSsJTS PREFERRE ° 
of matches between various segments in each of the profiles 

and then, for each matching segment records the frequency Although the present invention is described with refer- 

with which the matching segment recurs within the user ence to interfacing a local computer workstation with the 

profile. The system then applies the original search string to Internet, it should be understood that the system and method 

a standard match engine to obtain a list of data item 60 of present invention may be applied, without departing from 

addresses that potentially match the user's search require- the spirit of the invention, to any arrangement where the 

ments and then retrieves the data item profiles corresponding local computer workstation is connected via a telecommu- 

to the data item addresses on the list. This procedure is nication link to a remote computer system such as worksta- 

optiooal but is recommended because a direct linguistic tion or computer network, where the remote computer 

pattern search over all data items stored on the remote 65 system may range from a single computer server worksta- 

computer system can be very time consuming given the tion to a local area or distributed network. Furthermore it 

modern computing and data transfer technologies. should be understood that the system and method of present 
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invention may be applied, without departing from the spirit private parties computers that are linked together to 
of the invention, to a self contained single computer work- exchange information. While originally, the Internet was 
station having long-term data storage. Finally, it should be used mostly by scientists, the advent of the World Wide Web 
noted that the system and method of the present invention has brought the Internet into mainstream use. The World 
are completely language independent and may thus be 5 Wide Web (hereafter "WWW") is an international, virtual- 
applied and utilized with any language. network-based information service composed of Internet 
Referring initially to FIG. 1, a profiling/search system 10 host computers that provide on-line information in a specific 
for automatically generating personalized user profiles and hypertext format. WWW host servers provide hypertext 
for utilizing the generated profiles to perform adaptive metalanguage (HTML) formatted documents using a hyper- 
Internet or computer data searches is shown. As shown, the a0 text transfer protocol (HTTP). Information on the WWW is 
profiling/search system 10 includes a local computer system accessed with a hypertext browser such as the Netscape 
12, connected to a remote computer network 30 via a navigator or Microsoft Explorer. Web sites are collections of 
telecommunication link 26. The local computer system 12 interconnected WWW documents. 

includes a control unit 14, such as a CPU and related Assuming the remote computer system 30 is the Internet, 

circuitry for controlling the operation of the local computer „ certain functional explanation is necessary for the commu- 

system 12 and for executing application programs, a mca ^ gateway 28, .the gemote data storage system 32, and 

memory 16, such as random access memory, for temporarily the R K CS . contro1 ™f M A ^ communication gateway 28 

storing control program instructions and variables during the ma ? * e ^plementedand controlled by an Internet service 

f. f ... . , Ta provider (i.e. an ISP)— a company that offers the user of the 

execution of apphcation programs by the control unit 14, a ^ te ^ 12 a J es / t0 the Internet 30 and the 

storage memory 18, such as flash memory or a disk dnve for 20 ww th h ' a apphcation stored in storage 
long term storage of data and apphcation programs; and memory 18 ^ ^^^^ between the corn- 
input device(s) 20 for accepting input from the user, that mun ication device 24 and the communication gateway 28. 
include at least one of the following input devices: a Typically, the user can access and navigate the WWW using 
keyboard, a selection device (i.e. mouse, trackball, or a hypertext browser application residing on the local com- 
touchpad), and a voice recognition device with speech to M pu ter system 12. The remote data storage system 32 is not a 
text capabilities. single device, but is representative of the storage devices 

The local computer system 12 further includes: output that are used by the multitude of Internet host computers and 

device(s) 22 for providing output data to the user, that networks (not shown). The RCS control unit 34 is repre- 

include at least one of the following output devices: a sentative of a plurality of control units of the multitude of 

display unit such as a CRT monitor or flat panel display, a 30 Internet host computers and networks (not shown), 

printer, and a text to speech device with sound output No hierarchy exists in the WWW, and the same informa- 

cap abilities; and a communication device 24 for transmitting tion may be found by many different approaches. Hypertext 

to, and receiving data from, the remote computer system 30 links in WWW HTML documents allow readers to move 

via the telecommunication link 26, such as a modem or other from one place in a document to another (or even between 

telecommunication device. The telecommunication link 26 35 documents) as they want to. One of the advantages of WWW 

may be a standard telephone line, a DSL line, a high speed is that there is no predetermined order that must be followed 

data transmission such as a Tl or T3 line, or a wireless in navigating through various WWW documents. Readers 

telecommunication (i.e. cellular or radio) link. The local can explore new sources of information by linking from 

computer system 12 may be a generally conventional desk- place to place. This linking has been made as easy as 

top personal computer, an informational kiosk, or a portable 40 clicking a mouse button on the subject a user wants to 

computer such as a laptop or a personal digital assistant access. Each WWW document also has a unique uniform 

(PDA). resource locator ("URL") that serves as an "address" that, 

The remote computer system 30 may be any remote when followed leads the user to the document or files 

computer system such as a single computer server or a location on the WWW. Using the browser, the user can also 

network of interconnected computer systems, such as a local 45 mark and store "favorites' —URLs of particular WWW 

area or a wide area network. The remote computer system 30 documents that interest the user such that the user can 

includes a communication gateway 28, such as a modem quickly and easily return to these documents in the future by 

and/or a network router connected to the telecommunication selecting them from the favorites list in the browser, 

link 26; a remote data storage system 32 for long term data Because of the vastness of the Internet and the WWW, 

storage, and a remote computer system control unit 34 50 locating specific information desired by the user can be very 

(hereinafter RCS control unit 34), such as a single CPU and difficult. To facilitate search for information a number of 

associated devices, when the remote computer system 30 is "search engines" have been developed and implemented. A 

a single computer server, or a set of independent CPUs and search engine is a software application that searches the 

associated devices when the remote computer system 30 is Internet for web sites containing information on the subject 

a network of interconnected computer systems. The remote 55 in which the user is interested. These searches are accom- 

data storage system 32 may be a single data storage device, plished in a variety of ways — all well-known in the art. 

such as a disk drive, or a distributed data storage system over Typically, a user first inputs a "search string" to the hypertext 

a plurality of separate interconnected computer systems each browser containing key words representative of the infor- 

having individual data storage units (not shown). mation desired by the user. The search engine then applies 

In an embodiment of the present invention depicted in 60 the search string to a previously constructed index of a 

FIG. 1, the remote computer system 30 is preferably the multitude of web sites to locate a certain number of web sites 

Internet (hereinafter, the remote computer system 30 is having content that matches the user's search string. The 

interchangeably referred to as the Internet 30). Before located web site URLs are then presented to the user in the 

describing the present invention in greater detail, it is helpful order of relevance to the key words in the user's search 

to briefly describe the Internet and related concepts. Simply 65 string. 

stated, the Internet is a massive collection of individual However, as was noted earlier, typical key word and even 

networks operated by government, industry, academia, and more advanced searches only provide the user with search 
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results that depend entirely on the search string entered by central profile database located in a profile storage device 

the user, without any regard to the user's cultural, 36, such as a storage memory device attached to a specific 

educational, social backgrounds or the user's psychological Internet host computer, in the remote data storage system 32. 

profiles. For example, a twelve year old child using key Storing User_Pro files in the central profile database is 

word searches on the Internet for some information on 5 advantageous because a user may be able to utilize his or her 

computers may be presented with a multitude of documents User_Profile even when accessing the remote computer 

that are far above the child's reading and educational level. system 30 from a computer other than the local computer 

All texts composed by the user, or adopted by the user as system 12. 

favorite or inimical (such as a favorite book or short story), Thus, at the test 102, the control unit 14 searches the 

contain certain linguistic patterns, or combinations of vari- 10 storage memory 18 to determine whether the local profile 

ous parts of speech (nouns, verbs, adjectives, etc.) in sen- database contains a User ^Profile that has been previously 

tences that reflect the user's cultural, educational, social created for the user that has been identified at the step 100. 

backgrounds and the user's psychological profile. Research In addition, since the User_Profile may be also stored in the 

has shown that most people have readily identifiable lin- central profile database in the profile storage device 36, at 

guistic patterns in their expression and that people with ^ the test 102 the control unit 14 also searches the profile 

similar cultural, educational, and social backgrounds will storage device 36 to determine whether the central profile 

have similar recurring linguistic patterns. In summary, in database contains a User__Profile that has been previously 

accordance with the present invention, particular linguistic created for the user. Optionally, if User_Profiles are stored 

patterns and their frequencies of recurrence are extracted both in a local profile database and the central profile 

from the texts provided by the users of the system of the 20 database, the control unit 14 ensures that both User_Profiles 

present invention and stored in a user profile data file. The are identical to one another, by replacing an older User_ 

user profile data file is thus representative of the user's Profile with a newer one if the User__Pronles in each of the 

overall linguistic patterns. All documents in a remote com- databases differ from one another. 

puter system, such as the Internet, are likewise analyzed and If at the test 102, the control unit 14 determines that a 

their linguistic patterns and respective recurrence frequen- 25 User_Profile for the identified user does not exist, then a 

cies also extracted and stored in corresponding document new empty User„Profile is created at a step 104 and stored 

profiles. When a search for particular data is initiated by the i n the storage memory 18. At a test 106, the control unit 14 

user, linguistic patterns are also extracted from a search queries the user whether the user wishes to voluntarily 

string provided by the user into a search profile. The user contribute User_Data to the User__Profile. User_Data may 

profile is then cross matched with the search profile and the 30 be of two types — personal textual data generated by the user, 

document profiles to determine whether any linguistic pat- and favorite textual data generated by a source other that the 

terns match in all three profiles and to determine the mag- user. Personal textual data preferably consists of any docu- 

nitude of the match based on relative recurrence frequencies ments created and composed by the user and may include, 

of matching user and document linguistic patterns. The but is not limited to: books, articles, memorandums, essays, 

documents with document profiles having the highest 35 compositions, e-mails, reports, and web sites. Favorite tex- 

matching magnitudes are presented to the user as not only tual data preferably consists of any documents that were 

matching the subject of the search string, but also as created by a source other than the user but that the user has 

corresponding to the user's cultural, educational, and social adopted as being particularly interesting, fascinating, or 

backgrounds as well as the user's psychological profile. appealing, and may include, but is not limited to books, 

Thus, a world renowned physicist searching for information 40 articles, memorandums, essays, compositions, e-mails, 

on quasars would be presented with very sophisticated reports, and web sites. Furthermore, a user with an existing 

physics documents that are oriented to wards his level of User__Profile may initiate the user profiling control program 

j expertise. from the test 106 when the user wishes to update his or her 

Referring now to FIG. 2, a logic flow diagram represent- profile by supplying additional User_Data to the control 

I ing a user profiling control program for the control unit 14 45 unit 14. 

of FIG. 1 in accordance with a preferred embodiment of the At the test 106, the user preferably instructs the control 

I present invention is shown. As a matter of design choice, one unit 14 to acquire all of personal textual data stored in the 

j or more of the steps of the user profiling control program storage memory 18, for example by scanning the user's 

! may be executed by the RCS control unit 34 without "sent" e-mail folders, document directories and any direc- 

! departing from the spirit of the present invention. The 50 tories with any other documents that the user identifies as 

purpose of the user profiling control program is to generate personal textual data. Alternately, the user may identify 

or update a User_Profile computer file representative of the specific personal documents to be used as personal textual 

; user's linguistic patterns (and thus representative of the items. TTie user may also instruct the control unit 14 to 

j user's social, educational, and cultural background, as well acquire selected favorite textual data from documents iden- 

as of the user's psychological profile). 55 titled by the user as "favorite" that are stored in the storage 

The user profiling control program begins at a step 100 memory 18, or instruct the control unit 14 to retrieve WWW 

where the user's identity is verified by the control unit 14, documents from the remote data storage system 32 of the 

for example by asking the user to provide a password or Internet 30 in accordance with the URLs stored in the 

some form of a biometric identifier such as a fingerprint, a "favorites" section of the browser. In addition, the user may 

voice sample or a retinal image to the input device 20. At a 60 identify additional WWW documents to the control unit 14 

test 102, the control unit 14 determines whether a User_ as favorite textual data, such that the control unit 14 retrieves 

Profile has been previously generated for the user. Because these additional documents and adds them to User_Data. 

1 a particular local computer system 12 may be used by Furthermore, the user may specify, to the control unit 14, 

multiple users, a variety of Userjrofiles, one for each certain long texts such as full text classical books stored on 

individual user, may be stored in the storage memory 18 in 65 the Internet 30 as being favorite textual data. For example, 

a local profile database. In addition to, or instead of, the local the user may regard Homer's Illiad as his favorite book and 

profile database, User_Profiles may be stored in a remote thus identify it as favorite textual data. Both personal and 
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favorite textual data are stored in User_JData as Text_ 
Items — i.e. individual text documents. 

User_Data is preferably structured as a computer data file 
that contains a number of sequential individual Text_Items 
that are separated from one another by some sort of a 5 
delimiter readily identifiable by the control unit 14. The 
quantity and quality of User_Data provided by the user is 
directly proportional to the quality, accuracy and usefulness 
of the User_Profile that will be based on the User_Data. 
Thus, the user is encouraged to provide as much personal 1Q 
and favorite textual data as possible. It should be noted that 
although the user may submit very personal texts as personal 
textual data to the control unit 14, as will be explained below 
in connection with FIG. 3, the User_Profile does not contain 
any private information about the user nor does it contain 15 
any textual excerpts from the user's private texts. Instead, as 
was previously explained, the control unit 14 extracts lin- 
guistic patterns from the texts rather than the actual infor- 
mation conveyed by the texts. Of course, in some circum- 
stances the user may not have any personal or favorite 20 
textual data stored in the storage memory 18, for example if 
the local computer system 12 is brand new. Also, it is 
possible that the user may not have enough knowledge of the 
Internet to specify any favorite web sites or on-line docu- 
ments. In both cases the user may be unable to provide any ^ 
User_Data to the control unit 14. It should be noted that 
after the completion of the user profiling control program, 
the User Data file is purged by the control unit 14. 

If the control unit 14 determines at the test 106 that 
User_Data is to be contributed by the user, then at a step 30 
108, the control unit 14 acquires User_Data, including 
personal and favorite Text_Items identified by the user at 
the test 106, from the storage memory 18 and/or from the 
remote data storage system 32 of the Internet 30. The control 
unit 14 then proceeds to a test 118. If, on the other hand, the 35 
control unit 14 determines at the test 106 that User_Data is 
not to be contributed (i.e., for example if the user does not 
have any data stored in the storage memory 18), then the 
control unit 14 proceeds to a step 110. 

Returning to the test 102, if at the test 102, the control unit 40 
14 determines that a User_Profile for the identified user 
already exists, then at the step 110 the user begins an Internet 
browsing session using a hypertext browser (such as 
Netscape or Explorer). During a browsing session, the user 
may navigate through a variety of web sites, HTML docu- 45 
ments or other types of Text Items. In an alternate embodi- 
ment of the present invention, when the remote computer 
system 30 is not the Internet, at the step HO the user may 
begin using any software application that may be installed 
on the local computer system 12 and that is configured for 50 
searching for data and/or for navigating through a plurality 
of data files, i.e., Text_Jtems. 

It should be noted that steps 112-116 may be performed 
by the control unit 14 substantially simultaneously. 
Furthermore, it should be noted that steps 112 and 114 are 55 
optional. At a step 112, the control unit 14 begins to monitor 
the user's browsing session initiated at the step 110 for the 
entire duration of the browsing session. If the user spends 
more than a pre-determined "M" period of time viewing a 
particular Text_Item, then the control unit 14 adds the 60 
Text_Item to User_Data — in effect by spending more that 
a particular period of time browsing a Text_Jtem, the user 
has adopted the Text_Item as one of the user's favorite 
textual items. Preferably, the control unit 14 accumulates a 
total duration of time Q that each Text _Jtem is viewed by 65 
the user over a predetermined period P. If during the period 
P, Q exceeds the period M, then the control unit 14 adds the 
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Text_Item to User_Data. The time period P is preferably 24 
hours, but may be as long as one week, or longer. The period 
M may be one or more hours and is preferably set in 
accordance with the period P. Thus, for example, if P is set 
to 24 hours, M is preferably set between one to two hours, 
while if P is set to one week M may be set to five to ten 
hours. To illustrate the operation of the step 112, assuming 
P is set to 24 hours and M is set to two hours, if the user 
views a particular Text_Item for a total of two or more hours 
(viewing time Q is greater than M) during the 24 hour 
period, then the control unit 14 adds the viewed Text_Item 
to User_J)ata. 

At a step 114, the control unit 14 monitors the operation 
of the browser, such that when the user adds any Text_Jtem 
to the browser's "favorites" section, the control unit 14 
automatically adds the Text_Item to User_Data. For 
example, if the user visits a web site and the user becomes 
interested enough in the site's material that the user adds the 
web site (Text_Item) to the favorites section of the browser, 
the control unit 14 adds the Text__Item to User J)ata. 

At a step 116, the control unit 14 monitors the operation 
of the browser to automatically add, as Text_Items to 
User_Data, any search strings that the user inputs into the 
browser. Thus, for example, when the user utilizes the 
browser's search capabilities to search for "computer that 
mimics human thinking process and artificial intelligence 
and neural network", the control unit 14 adds this search 
string to User_Data as a Text_Item. 

At the optional test 118, the control unit 14 determines if 
the User__Profile should be updated. If the User_Profile file 
was created at the step 104, then a determination of whether 
sufficient User J)ata has been accumulated at the step 108 
or the steps 112-116 may be required. Preferably, the control 
unit 14 counts the total number of words in all Text_Items 
in User„Data and compares the total to a predetermined 
word count threshold. If the total number of words in 
User_Data exceeds the word count threshold, then User_ 
Data is sufficient for updating the User_Profile, and the 
control unit 14 proceeds to the step 120. On the other hand, 
if the total number of words in User_Data is below the word 
count threshold, then User_Data is insufficient for updating 
of the initial User_Profile. The control unit 14 then returns 
to the step 110 where the user may continue the browsing 
session so that the control unit 14 may continue to accumu- 
late additional Text_Items for User_Data at the steps 112 to 
116. 

This approach is advantageous because it ensures that the 
User_Profile is based on sufficient linguistic data provided 
by the user before its utilization. If a new Userjrofile 
based on insufficient linguistic data is used it may provide 
inaccurate results. The word count threshold may be selected 
as a matter of design choice, keeping in mind that the 
magnitude of User__Data is proportional to the accuracy of 
the Userjrofile derived from User_Data. For example, the 
threshold total may be set between 1000 and 3000 words. 
Alternatively, instead of counting the total number of words 
in User__Data, the control unit 14 may count a total of all 
Text_Jtems in User_Data and compare that total to another 
threshold. For example, the threshold may be set to twenty 
Text_Items. 

If, at the test 102, the control unit 14 determined that a 
User_Profile for the user already exists, then the control unit 
14 determines whether the existing User__Profile should be 
updated. If frequent updating of the existing User _Profile is 
undesirable (for example to conserve computing resources), 
then an update criteria for updating the User_Profile may be 
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set as matter of design choice. The update criteria may 
include, but is not limited to: a particular period of time 
between updates, for example updating no more than once 
per 24 hours, or addition of a particular number of words to 
User_Data during the steps 112 to 116 and/or the step 108 
if the user voluntarily contributed Text__Items to User_Data 
to update the existing User_Profile. For example, this 
particular number of words may be 500 or more. If the 
update criteria is not met, then the control unit 14 returns to 
the step 110. If, on the other hand frequent updating of the 
existing User_Profile is desired or if the update criteria has 
been met, then the control unit 14 proceeds to the step 120. 

At the step 120, the control unit 14 performs a profile 
procedure subroutine to update the User__Profile. Subrou- 
tines are known in the computer programming art as func- 
tions designed to perform specific tasks requested by a main 
control program. As a matter of design choice, one or more 
of the steps of the profile procedure subroutine may be 
executed by the RCS control unit 34 without departing from 
the spirit of the present invention. One of the advantages of 
using subroutines is that two or more programs can use the 
same subroutine to perform a particular function. Modern 
programming techniques also encompass programmable 
"objects" which function similarly to subroutines. The main 
advantage of programmable "objects" is that once an 
"object" is developed to perform a particular function, it 
may be used in any program wishing to use that function. 
The purpose of the profile procedure subroutine is to 
compose/update the User_Profile by analyzing and extract- 
ing linguistic patterns from the Text_Jtems in User_Data 
and adding the extracted linguistic patterns to the User_ 
Profile. 

Referring now to FIG. 3, the profile procedure subroutine 
begins at a step 200 and proceeds to a step 202 where the 
control unit 14 retrieves and opens the User_Profile from 
the storage memory 18. At a step 204, the control unit 14 
retrieves the first Text_Jtem from User_Data. At a step 206, 
the control unit 14 separates the retrieved Text_Item into at 
least one separate "sentence" — a collection of words from 
which linguistic patterns will be extracted to form the 
User_Profile. Most Text__Items are documents that consist 
of a plurality of typical grammatical sentences separated by 
"end of sentence" (hereinafter "EOS") punctuation marks, 
such as periods, colons, and exclamation and question 
marks. Thus, the control unit 14 can readily separate a 
typical Text_Item into a number of separate sentences by 
identifying each separate sentence as a set of words ending 
in an EOS punctuation mark. 

Other Text_Items, such as search strings, may not have 
any EOS punctuation marks and may be of significant 
length. Furthermore, certain compound sentences, such as 
patent claims, may contain multiple clauses and may also be 
of significant length. Preferably, a maximum sentence word 
count is defined as L as a matter of design choice. For 
example, L may be set to fifty words. The control unit 14 
analyzes the Text_Item and counts words until a EOS 
punctuation is reached; if the word count reaches L and an 
EOS punctuation mark is not reached, then the control unit 
14 identifies the L words as a sentence (i.e. as if an EOS 
punctuation mark was actually reached at L words) and 
begins a new word count for the next sentence. For example 
if the Text_Jtem is a 158 word search string, and L is set to 

fifty words, then this Text Item will be separated into four 

sentences with fifty words in each of the first three sentences 
and eight words in the fourth sentence. If an EOS punctua- 
tion mark is reached before the word count reaches L, then 
the control unit 14 first identifies the words before the EOS 
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punctuation mark as a sentence and then begins a new word 
count for the next sentence. 

At a test 208, the control unit 14 determines whether all 
sentences from the Text_Item retrieved at the step 204, have 
been retrieved. Because sentences are retrieved at a later step 
210, during a first iteration of the test 208, where no 
sentences have been retrieved by the control unit 14 thus far, 
the control unit 14 proceeds directly to the step 210. During 
subsequent iterations, if all sentences have been retrieved 
from the current Text_Item, then the control unit 14 pro- 
ceeds to a test 220 (FIG. 4). If, on the other hand, not all 
sentences have been retrieved, then the control unit 14 
proceeds to the step 210. 

At the step 210, the control unit 14 retrieves the first 
sentence identified at the step 206 (or, during subsequent 
iteration of this step, retrieves the next sentence). The 
control unit 14, then identifies and tags each word in the 
retrieved sentence as a particular part of speech (hereinafter 
"POS") — i.e. a noun, pronoun, verb, etc. To simplify further 
processing of the POS, after tagging the POS, the control 
unit 14 automatically brings all verbs to simple present 
tense, and brings all nouns to singular form. For example in 
a sentence "Joe walked to his beautiful home", the control 
unit 14 would tag "Joe" and "home" as nouns, "walk" as a 
verb, "to" as a preposition, "his" as a pronoun, and "beau- 
tiful" as an adjective. However, since for the purpose of 
performing data searches only a few POS are necessary, the 
control unit 14 preferably only identifies and tags certain 
predetermined POS such as nouns, verbs and adjectives. 

This procedure is performed in accordance with standard- 
ized rules of grammar. Automatic identification of parts of 
speech in a sentence is well known in the art and need not 
be described herein. For example, many conventional word 
processors unitize grammar checking functions that are 
capable if identifying parts of speech in a sentence. The 
particular POS that are identified and tagged by the control 
unit 14 may include, but are not limited to: noun, pronoun, 
verb, adverb, adjective, gerund, propositions, conjunctions 
and interjections. To simplify further processing of the POS, 
during the step 210, after tagging the POS the control unit 
14 automatically brings all verbs to simple present tense, and 
brings all nouns to singular. 

At a test 212, the control unit 14 analyzes each word in the 
sentence and determines of it is a unique POS. Certain words 
may be used as different parts of speech, for example, the 
word "police" may be used both as a noun and as a verb. 
This determination may be done with reference to a dictio- 
nary stored in the storage memory 18 or in the remote data 
storage system 32. If the word is a unique POS, then the 
control unit 14 proceeds to a step 216. Otherwise, if the word 
is not a unique POS, then the control unit 14 proceeds to a 
step 214, where the control unit 14 tags the word with 
multiple tags in accordance with its possible POS usage. For 
example, the word "police" would be tagged as a noun and 
as a verb. 

At the step 216, the control unit 14 extracts one or more 
segments from the sentence retrieved at the step 210 that are 
representative of the linguistic patterns of the sentence. A 
segment consists of one or more predetermined types of 
POS arranged in a predetermined order. The number, the 
type, and the order of POS in a segment may be selected as 
a matter of design choice, depending on the purpose for 
which the User_Profile will be utilized. For the purpose of 
performing data searches, preferably each segment is a triad 
(i.e. N=3) of three POS arranged as follows: noun-verb - 
adjective. Thus, in accordance with this embodiment, 
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previously, at the step 210, the control unit 14 only identifies 
and tags nouns, verbs and adjectives, and at the step 216 the 
control unit 14 extracts noun-verb-adjective segments from 
each sentence. 

Alternately, the following other arrangements may be 
used for the segment if desired: noun-adverb-adjective; 
gerund-verb-adjective; gerund-adverb- adjective; pro noun- 
verb -adjective; pro noun-adverb -adjective. Accordingly, the 
appropriate POS used in the segment would need to have 
been previously tagged by the control unit 14 at the step 210. 
Furthermore, in an alternate embodiment of the present 
invention, the segments may consist of one or more POS. 

Because a sentence may contain multiple POS of the same 
type, i.e. two nouns, several segments may potentially be 
composed by the control unit 14 from a single sentence. 
Thus, in accordance with the present invention, the control 
unit 14 extracts every possible noun-verb-adjective segment 
from the sentence. For example, if the sentence is "Joe 
walked to his beautiful new house", then the control unit 14 
would extract the following segments therefrom: 

Joe-walk-beautiful 

Joe-walk-new 

house-walk-beautiful 

house-walk-new 
However, if a particular sentence is missing one of the three 
POS (noun, verb, adjective) required in the segment, then 
the control unit 14 inserts a "blank" flag (for example the 
characters "O") into the position of the missing POS. For 
example, if the sentence is "Joe walked to his house", then 
the control unit 14 would extract the following segments 
therefrom: 

Joe- walk- O 

house -walk- O 
The blank flag " O" was inserted by the control unit 14 into 
the position of the adjective POS that was not present in the 
sentence. 

At a step 218, the control unit 14 temporarily stores all 
segments extracted at the step 216 in the User_Profile and 
then returns to the test 208. 

Referring now to FIG. 3, at the test 220, the control unit 
14 determines if all Text__I terns have been retrieved from 
User _Data. If all Text_Items have not been retrieved, then 
the control unit 14 returns to the step 204 where the control 
unit 14 retrieves the next Text_Jtem. Otherwise, if the 
control unit 14 determines that all Text — Items have been 
retrieved from User_Data, then the control unit 14 proceeds 
to a step 222. 

Thus, in summary, during steps 204 to 220, the control 
unit 14 retrieves all Text_Jtems from User__Data, splits 
each Tex t_I tern into sentences, analyzes each sentence to 
extract segments representative of the sentence's linguistic 
patterns and stores the extracted segments in User_ProfiIe. 

At the step 222, the control unit 14 groups identical 
segments together into sets, counts the occurrence of iden- 
tical segments in each set, and then records the number of 
identical segments in each set in User _Profile as User_ 
Profile segment count (hereinafter "UP_SC") next to each 
set of identical segments. For example, if the segment 
"computer-execute-fast'' appears twenty seven times in 
User_Profile, the UP__SC for that segment would be 
recorded next to that segment as "27". If the User__Profile 
already contains an identical segment set with an existing 
UP_SC, then the UP_SC determined at the step 222 is 
added to the existing UP_SC. For example, if the User__ 
Profile already contains the segment set "instruction- 
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execute-fast" with UP_SC of 15, and at the step 222 the 
control unit 14 determines that five such segments were 
extracted from User_Data during the steps 204 to 220, then 
the control unit 14 adds the new UP_SC of 5 to the existing 

s UP___SC of 15 and records the new UP__SC of 20 next to the 
segment set "instruction-execute-fast". A high UP_SC for a 
segment is indicative of the relative importance of the 
segment as a representation of the user's linguistic pattern. 
At a step 224, the control unit 14 sorts the identical 

]Q segment groups in the User_Profile from the identical 
segment group with the highest UP_SC to the segment 
group with the lowest UP_SC. Thus, after the step 224, the 
User_Profile may look as follows: 



15 



Segment 


UP sc 


co mputer-execute-fast 


27 


instruction-execute-fast 


20 


Joe-walk- <> 


5 


police-follow- vigilant 


1 



The number of different segment sets that may be stored 
in the User_Profile is practically limited only by the sizes of 

25 the storage memory 18 and the remote data storage system 
32, and the computing capabilities of the control unit 14 or 
of the RCS control unit 34. However, experimentation has 
shown that a very large number of segment sets in the 
User_Profile offers diminishing returns as balanced against 

30 the storage requirements for the User^Profile and the com- 
puting power required for the control unit 14 in order to 
effectively work with the Userjrofile. Thus, preferably 
only a certain amount of segment sets with the highest 
UP_SC should be stored in the User_Profile. As a result, at 

35 a step 226, the control unit 14 saves only Y of the segments 
having the highest UP_SCs to the User_Profile, deleting all 
the remaining segments. For example, Y may be set to 5000, 
such that only 5000 of the most commonly occurring 
segments are saved by the control unit 14 to the User_ 
Profile. Alternatively, Y% of the most commonly occurring 

40 segments may be saved to the User__Profile. For example, if 
Y is set to 20, the control unit 14 may save the top 20% of 
the segments with the highest UP_SCs. 

At a step 228, the control unit 14 returns the updated 
User_Profile to user profiling control program (FIG. 2). 

45 Returning now to FIG. 2, at a step 122, the control unit 14 
stores the updated User_Profile in at least one of the local 
profile database in the storage memory 18 and the central 
profile database in the profile storage device 36. Preferably, 
the User_Profile is stored "confidentially" — i.e. encrypted 

50 and protected by a password or by other access control 
means such as biometrics (e.g. a fingerprint scan, voice 
pattern matching, etc.) such that only the user can access and 
update his or her User_Profile. The control unit 14 then ends 
user profiling control program at a step 124, or optionally 

55 returns to the step 110, where the user can continue the 
browsing session. 

Referring now to FIG. 5, a logic flow diagram represent- 
ing a data profiling control program for the control unit 14 
of FIG. 1 in accordance with a preferred embodiment of the 

60 present invention is shown. Data_Jtem refers to any 
document, whether flat text or hypertext, that may be a target 
during a potential data search by the user. Accordingly, 
Data_I terns include all documents that are stored in the 
remote data storage system 32 on the remote computer 

65 system 30. For example, if the remote computer system 30 
is the Internet, all web sites on all Internet 30 host computers 
as well as all documents stored on file transfer protocol 
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(FTP) sites are Data_Items. The purpose of the data pro- 
filing control program is to generate Data_Item_Profile 
computer files representative of linguistic patterns of all 
Data_Items that may be subjected to a search by the user. 
In an alternate embodiment, if the profiling/search system 10 
includes only the local computer system 12, Data_Items 
may include all documents stored in storage memory 18. 
Preferably, the data profiling control program is executed by 
the RCS control unit 34. However, in the alternate embodi- 
ment where the profiling/search system 10 includes only the 
local computer system 12, the data profiling control program 
is executed by the control unit 14. 

The RCS control unit 34 begins the data profiling control 
program at a step 300 and proceeds to a step 302, where the 
RCS control unit 34 retrieves the first Data_Jtem from the 
remote data storage system 32. Preferably, to simplify the 
operation of the data profiling control program, prior to 
execution of the step 300, a list of the "addresses" of all 
Data_Items (hereinafter "Data_Item_Addresses") that are 
stored on the remote data storage system 32, is obtained 
from a typical indexing search engine. The address list 
enables the RSC control unit 34 to readily retrieve all 
Data_Jtems by sequentially following each address on the 
address list and retrieving the corresponding Data__Item. 
Indexing search engines, such as spiders or robots, which 
compile lists of addresses of all Internet documents/web 
sites are well known in the art and need not be described in 
detail herein. For example, there are companies that provide 
lists of all web sites on the Internet to various search engine 
providers. In should be noted that for the purpose on the 

present invention, indexing of the Data Items in 

unnecessary — only a list of Data__Item ^Addresses is 
required. 

At a step 304, the RCS control unit 34 creates a Data_ 
Item_Profile data file and stores the Data_Item_Address of 
the Datajtem retrieved at the step 302 therein. The Data_ 
Item_Profile is preferably stored in a remote central profile 
database located in a profile storage device 36. At a step 306, 
the RCS control unit 34 composes a Data_Item_Record for 
the data item by retrieving all textual data, i.e. Text_Items 
from the Data_Item itself. If the Data_Item is a hypertext 
document (e.g. a web site) with hypertext links on the "front 
page" to additional documents, then the RCS control unit 34 
also follows the links and retrieves, into the Data_Item__ 
Record, all Text_Jtems that are linked to the front page. 
Thus, while a standard text Data_Item may contain only a 
single Text_ltern, a hypertext Data__Item may contain a 
plurality of TexLJtems. 

At a step 308, the RCS control unit 34 retrieves the first 
Text_Item from the Data_Item_Record. At a step 310, the 
RCS control unit 34 separates the retrieved Text_Jtem into 
at least one separate "sentence" — a collection of words from 
which linguistic patterns will be extracted to form the 
Data_Item_Profile. As was noted before, most Text_I terns 
are documents that consist of a plurality of typical gram- 
matical sentences separated by EOS punctuation marks, 
such as periods, colons, and exclamation and question 
marks. Thus, the RCS control unit 34 can readily separate a 
typical Text_Jtem into a number of separate sentences by 
identifying each separate sentence as a set of words ending 
in an EOS punctuation mark. 

Other Text_J terns may not have any EOS punctuation 
marks and may be of significant length. Furthermore, certain 
compound sentences, such as patent claims, may contain 
multiple clauses and may also be of significant length. 
Preferably, a maximum sentence word count is defined as L 
as a matter of design choice. For example, L may be set to 
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fifty words. The RCS control unit 34 analyzes the Text_Jtem 
and counts words until a EOS punctuation is reached; if the 
word count reaches L and an EOS punctuation mark is not 
reached, then the RCS control unit 34 identifies the L words 
as a sentence (i.e. as if an EOS punctuation mark was 
actually reached at L words) and begins a new word count 
for the next sentence. For example if the Text_Item is a 158 
word patent claim, and L is set to fifty words, then this 
Text_Item will be separated into four sentences with fifty 
words in each of the first three sentences and eight words in 
the fourth sentence. If an EOS punctuation mark is reached 
before the word count reaches L, then the RCS control unit 
34 identifies the words before the EOS punctuation mark as 
a sentence and begins a new word count for the next 
sentence. 

At a test 312, the RCS control unit 34 determines whether 
all sentences from the Text_Item retrieved at the step 308, 
have been retrieved. Because sentences are retrieved at a 
later step 314, during a first iteration of the test 312, where 
no sentences have been retrieved by the RCS control unit 34 
thus far, the RCS control unit 34 proceeds directly to the step 
314. During subsequent iterations, if all sentences have been 
retrieved from the current Text_Item, then the RCS control 
unit 34 proceeds to a test 324 (FIG. 6). If, on the other hand, 
not all sentences have been retrieved, then the RCS control 
unit 34 proceeds to the step 314. 

At the step 314, the RCS control unit 34 retrieves the first 
sentence identified at the step 310 (or, during subsequent 
iteration of this step, retrieves the next sentence). The RCS 
control unit 34, then identifies and tags each word in the 
retrieved sentence as a particular part of speech (hereinafter 
"POS") — i.e. a noun, pronoun, verb, etc. To simplify further 
processing of the POS, after tagging the POS the RCS 
control unit 34 automatically brings all verbs to. simple 
present tense, and brings all nouns to singular form. For 
example in a sentence "John walked to his beautiful home", 
the RCS control unit 34 would tag "John" and "home" as 
nouns, "walked" as a verb, "to" as a preposition, "his" as a 
pronoun, and "beautiful" as an adjective. However, since for 
the purpose of performing data searches only a few POS are 
necessary, the RCS control unit 34 preferably only identifies 
and tags certain predetermined POS such as nouns, verbs 
and adjectives. 

This procedure is performed in accordance with standard- 
ized rules of grammar. Automatic identification of parts of 
speech in a sentence is well known in the art and need not 
be described herein. For example, man) conventional word 
processors unitize grammar checking functions that are 
capable if identifying parts of speech in a sentence. The 
particular POS that are identified and tagged by the RCS 
control unit 34 may include, but are not limited to: noun, 
pronoun, verb, adverb, adjective, gerund, propositions, con- 
junctions and interjections. 

At a test 316, the RCS control unit 34 analyzes each word 
in the sentence and determines of it is a unique POS. Certain 
words may be used as different parts of speech, for example, 
the word "police" may be used both as a noun and as a verb. 
This determination may be done with reference to a dictio- 
nary stored in the remote data storage system 32. If the word 
is a unique POS, then the RCS control unit 34 proceeds to 
a step 320. Otherwise, if the word is not a unique POS, then 
the RCS control unit 34 proceeds to a step 318, where the 
RCS control unit 34 tags the word with multiple tags in 
accordance with its possible POS usage. For example, the 
word "police" would be tagged as a noun and as a verb. 

At the step 320, the RCS control unit 34 extracts one or 
more segments from the sentence retrieved at the step 314 
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that are representative of the linguistic patterns of the 
sentence. A segment consists of one or more predetermined 
types of POS arranged in a predetermined order. The 
number, the type and the order of POS in a segment may be 
selected as a matter of design choice, depending on the 
purpose for which the Data_Item__Profile will be utilized. 
For the purpose of performing data searches, preferably each 
segment is a triad (i.e. N=3) of three POS arranged as 
follows: noun- verb -adjective. Thus, in accordance with this 
embodiment, previously, at the step 314, the RCS control 
unit 34 only identifies and tags nouns, verbs and adjectives, 
and at the step 320 the RCS control unit 34 extracts 
noun -verb-adjective segments from each sentence. 

Alternately, the following other arrangements may be 
used for the segment if desired: noun-adverb-adjective; 
gerund-verb -adjective; gerund-adverb- adjective; pronoun- 
verb-adjective; pronoun-adverb -adjective. Accordingly, the 
appropriate POS used in the segment would need to have 
been previously tagged by the RCS control unit 34 at the 
step 314. Furthermore, in an alternate embodiment of the 
present invention, the segments may consist of one or more 
POS. 

Because a sentence may contain multiple POS of the same 
type, i.e. two nouns, several segments may potentially be 
composed by the RCS control unit 34 from a single sen- 
tence. Thus, in accordance with the present invention, the 
RCS control unit 34 extracts every possible noun-verb- 
adjective segment from the sentence. For example, if the 
sentence is "Joe walked to his beautiful new house", then the 
RCS control unit 34 would extract the following segments 
therefrom: 

Joe-walk-beautiful 

Joe-walk-new 

house -walk-beautiful 

house -walk-new 
However, if a particular sentence is missing one of the three 
POS (noun, verb, adjective) required in the segment, then 
the RCS control unit 34 inserts a "blank" flag (for example 
the characters "O") into the position of the missing POS. 
For example, if the sentence is "Joe walked to his house", 
then the RCS control unit 34 would extract the following 
segments therefrom: 

Joe-walk-O 

house -walk- O 
The blank flag "O" was inserted by the RCS control unit 
34 into the position of the adjective POS that was not 
present in the sentence. 

At a step 322 the RCS control unit 34 temporarily stores 
all segments extracted at the step 320 in the Data„item_ 
Profile and then returns to the test 312. 

Referring now to FIG. 6, at the test 324, the RCS control 
unit 34 determines if all Text_Items have been retrieved 
from Data_Item_Record. If all Text__Items have not been 
retrieved, then the RCS control unit 34 returns to the step 
308 where the RCS control unit 34 retrieves the next 
Text_Jtem. Otherwise, if the RCS control unit 34 deter- 
mines that all Text_Items have been retrieved from Data_ 
Item_Record, then the RCS control unit 34 proceeds to a 
test 326. At the test 326, the RCS control unit 34 determines 
whether all Data_Items have been retrieved from the Data_ 
Itcm^Address list. If all Data_I terns have been retrieved, 
then the RCS control unit 34 proceeds to a step 328. If all 
Data_Items have not been retrieved, then the RCS control 
unit 34 returns to the step 302, where the RCS control unit 
34 retrieves the next Data_Item. 

Thus, in summary, during steps 302 to 326, the RCS 
control unit 34 sequentially retrieves Data_I terns from a 



19,067 Bl 

20 

previously composed Data_Item__Address list, and for each 
Data_ltem, the RCS control unit 34 retrieves all Text_ 
Items from Data_Jtem__Record, splits each Text__Item into 
sentences, analyzes each sentence to extract segments rep- 

5 resentative of the sentence's linguistic patterns, and stores 

the extracted segments in Data Item_Profile. 

At the step 328, the RCS control unit 34 groups identical 
segments together into sets, counts the occurrence of iden- 
tical segments in each set, and then records the number of 

1Q identical segments in each set in Data_Item__Profile as 
Data _Jtem ^Profile segment count (hereinafter " DIP_SC") 
next to each set of identical segments. For example, if the 
segment "science-advance-medical" appears twenty five 
times in Data__Item_Profile, the DIP_SC for that segment 
would be recorded next to that segment as "25 ". If the 

15 Data_Item„Profile already contains an identical segment 
set with an existing DIP_SC, then the DIP_SC determined 
at the step 328 is added to the existing DIP__SC. For 
example, if the Data_Item_Profile already contains the 
segment set "cure-develop-great" with DIP_SC of 15, and 

20 at the step 328 the RCS control unit 34 determines that five 
such segments were extracted from Data__Item_Record 
during the steps 302 to 324, then the RCS control unit 34 
adds the new DIP_SC of 5 to the existing DIP__SC of 15 
and records the new DIP_SC of 20 next to the segment set 

25 "cure-develop-great". A high DIP_SC for a segment is 
indicative of the relative importance of the segment as a 
representation of the Data_Item's linguistic pattern. 

At a step 330, the RCS control unit 34 sorts the identical 
segment groups in the Data_Jtem_Profile from the identical 

30 segment group with the highest DIP_SC to the segment 
group with the lowest DIP_SC. Thus, after the step 330, the 
Data _Jtem__Profile may look as follows: 
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Segment 


DIP SC 


sdence-advance-medica I 


25 


cure-develop- great 


20 


po lice-f o Hows -vigilant 


2 



The number of different segment sets that may be stored 
in the Data_Item_Profile is practically limited only by the 
remote data storage system 32, and the computing capabili- 

as ties of the RCS control unit 34. However, experimentation 
has shown that a very large number of segment sets in the 
Data_Item_Profile offers diminishing returns as balanced 
against the storage requirements for the Data__Item_Profile 
and the computing power required for the RCS control unit 

50 34 in order to effectively work with the Data_Item_Profile. 
Thus, preferably only a certain amount of segment sets with 
the highest DIP_SC should be stored in the Data__Item_ 
Profile. As a result, at a step 332, the RCS control unit 34 
saves only X of the segments having the highest DIP__SCs 

55 to the Data_Jtem_Profile, deleting all the remaining seg- 
ments. For example, X may be set to 5000, such that only 
5000 of the most commonly occurring segments are saved 
by the RCS control unit 34 to the Data_Item„Profile. 
Alternatively, X% of the most commonly occurring seg- 

60 ments may be saved to the Data_Jtem_Profile. For 
example, if X is set to 15, the RCS control unit 34 may save 
the top 15% of the segments with the highest UP_SOs. At 
a step 334, the RCS control unit 34 stores the Data_Jtem_ 
Profile in the central profile database in the profile storage 

65 device 36. 

Referring now to FIG. 7, a logic flow diagram represent- 
ing a data searching control program for the RCS control 
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unit 34 of FIG. 1 in accordance with a preferred embodiment 
of the present invention is shown. In an alternate 
embodiment, a data searching control program may instead 
be executed by the control unit 14 of FIG. 1. The purpose of 
the data searching control program is to enable a user to 
utilize the profiling/searching system 10 to perform 
advanced searches for desired data files, such that the data 
files returned as search results correspond to the user's 
educational, cultural, social backgrounds and to the user's 
psychological profile. This is accomplished by ensuring that 
linguistic patterns of the data files presented to the user 
substantially correspond to the user's linguistic patterns. 
Typically, the data searching control program will be utilized 
by the user during a data browsing session, such as per- 
formed by the user at the step 110 (FIG. 1). 

The RCS control unit 34 begins the data searching control 
program at a step 400 and proceeds to a step 402, where the 
user provides a Search_String consisting of a number of 
words representative of the subject matter of the data desired 
by the user and of any limiting information to further narrow 
the search, to the RCS control unit 34. The user may enter 
the Search_String manually by using the input device 20 
such as a keyboard, or alternatively, the user may utilize a 
speech recognition input device 20 to enter the Search^ 
String via vocalization. 

At a step 404, the RCS control unit 34 creates a Search_ 
Profile data file that, at the completion of steps 406 to 420, 
will be representative of the linguistic patterns of the 
Search_String. At a step 406, the RCS control unit 34 
separates the Search_String into at least one separate 
"sentence" — a collection of words from which linguistic 
patterns will be extracted to form the Search_Profile. Most 
Search„_Strings may not have any EOS punctuation marks 
and may be of significant length. Thus it may be difficult to 
isolate and identify individual sentences within the Search_ 
String. Thus, preferably, a maximum sentence word count 
for the Search„String is defined as W as a matter of design 
choice. For example, W may be set to twenty words. The 
RCS control unit 34 analyzes the Search__String and counts 
words until a EOS punctuation is reached; if the word count 40 
reaches L and an EOS punctuation mark is not reached, then 
the RCS control unit 34 identifies the L words as a sentence 
(i.e. as if an EOS punctuation mark was actually reached at 
L words) and begins a new word count for the next sentence. 
For example if the Search_String is 65 words, and W is set 
to twenty words, then this Search_String will be separated 
into four sentences with twenty words in each of the first 
three sentences and five words in the fourth sentence. If an 
EOS punctuation mark is reached before the word count 
reaches W, then the RCS control unit 34 identifies the words 
before the EOS punctuation mark as a sentence and begins 
a new word count for the next sentence. 

At a test 408, the RCS control unit 34 determines whether 
all sentences have been retrieved from the Search_String. 
Because sentences are retrieved at a later step 410, during a 
first iteration of the test 408, where no sentences have been 
retrieved by the RCS control unit 34 thus far, the RCS 
control unit 34 proceeds directly to the step 410. During 
subsequent iterations, if all sentences have been retrieved 
from the Search_String, then the RCS control unit 34 
proceeds to a step 424 (FIG. 8). If, on the other hand, not all 
sentences have been retrieved, then the RCS control unit 34 
proceeds to the step 410. 

At the step 410, the RCS control unit 34 retrieves the first 
sentence identified at the step 406 (or, during subsequent 
iteration of this step, retrieves the next sentence). The RCS 
control unit 34, then identifies and tags each word in the 
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retrieved sentence as a particular FUS — i.e. a noun, 
pronoun, verb, etc. To simplify further processing of the 
POS, after tagging the POS the RSC control unit 34 auto- 
matically brings all verbs to simple present tense, and brings 
all nouns to singular form. For example in a sentence "Joe 
walked to his beautiful home", the RCS control unit 34 
would tag "Joe" and "home" as nouns, "walk" as a verb, "to" 
as a preposition, "his" as a pronoun, and "beautiful" as an 
adjective. However, since for the purpose of performing data 
searches only a few POS are necessary, the RCS control unit 
34 preferably only identifies and tags certain predetermined 
POS such as nouns, verbs and adjectives. 

This procedure is performed in accordance with standard- 
ized rules of grammar. Automatic identification of parts of 
speech in a sentence is well known in the art and need not 
be described herein. For example, many conventional word 
processors unitize grammar checking functions that are 
capable if identifying parts of speech in a sentence. The 
particular POS that are identified and tagged by the RCS 
control unit 34 may include, but are not limited to: noun, 
pronoun, verb, adverb, adjective, gerund, propositions, con- 
junctions and interjections. 

At a test 412, the RCS control unit 34 analyzes each word 
in the sentence and determines of it is a unique POS. Certain 
words may be used as different parts of speech, for example, 
the word "police" may be used both as a noun and as a verb. 
This determination may be done with reference to a dictio- 
nary stored in the remote data storage system 32. If the word 
is a unique POS, then the RCS control unit 34 proceeds to 
a step 416. Otherwise, if the word is not a unique POS, then 
the RCS control unit 34 proceeds to a step 414, where the 
RCS control unit 34 tags the word with multiple tags in 
accordance with its possible POS usage. For example, the 
word "police" would be tagged as a noun and as a verb. 

At the step 416, the RCS control unit 34 extracts one or 
more segments from the sentence retrieved at the step 410 
that are representative of the linguistic patterns of the 
sentence. A segment consists of one or more predetermined 
types of POS arranged in a predetermined order. The 
number, the type and the order of POS in a segment may be 
selected as a matter of design choice. For the purpose of 
performing data searches, preferably each segment is a triad 
(i.e. N=3) of three POS arranged as follows: noun-verb- 
adjective. Thus, in accordance with this embodiment, 
previously, at the step 410, the RCS control unit 34 only 
identifies and tags nouns, verbs and adjectives, and at the 
step 416 the RCS control unit 34 extracts noun-verb- 
adjective segments from each sentence. 

Alternately, the following other arrangements may be 
used for the segment if desired: noun-adverb-adjective; 
gerund-verb -adjective; gerund-adverb-adjective; pronoun- 
verb-adjective; pronoun-adverb-adjective. Accordingly, the 
appropriate POS used in the segment would need to have 
been previously tagged by the RCS control unit 34 at the 
step 410. Furthermore, in an alternate embodiment of the 
present invention, the segments may consist of one or more 
POS. It should be noted that whatever selections were 
previously made for the segment arrangement (i.e. N, types 
of POS, positions of POS) for User_Profile (FIGS. 3-4) and 
the Data_Item Profiles (FIGS. 5-6), the same arrangement 
should be selected for the segments for the Search_Profile. 

Because a sentence may contain multiple POS of the same 
type, i.e, two nouns, several segments may potentially be 
composed by the RCS control unit 34 from a single sen- 
tence. Thus, in accordance with the present invention, the 
RCS control unit 34 extracts every possible noun-verb - 
adjective segment from the sentence. For example, if the 
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sentence is "computers run advanced expensive software", computer-execute-advanced, 

then the RCS control unit 34 would extract the following computer-buy-expensive, 

segments therefrom: intelligence-compute-artificial; 

computer-run-advanced then the RCS control unit 34 would determine two matches 

computer-run-expensive s ^tween the User_Proftle and the Search_Profile- 

' computer-execute-advanced and "computer-buy- 
software-run-advanced expensive", and would retrieve the corresponding UP_SCs, 
software-run-expensive 10 and 3, respectively. 
However, if a particular sentence is missing one of the three At an optional step 428, the RCS control unit 34 applies 
POS (noun, verb, adjective) required in the segment, then 10 the Search_J5tring to a predetermined standard search 
the RCS control unit 34 inserts a "blank" flag (for example engine to return and retrieve a list of Data_Item_Addresses 
the characters "O") into the position of the missing POS. of Z n ^ mbeT of Data_Items mat potentially match the user's 
For example, if the sentence is "computers execute searc * requirements The step 438 is optional because the 
software" then the RCS control unit 34 would extract the scai f H^T* ° f ^ ^i^T^f m 
followin se ments therefrom* 15 S reater detail below in connection with steps 432 to 438, can 
* * be applied directly to all Data_Items stored on the remote 
computer-execute- O data storage system 32, without first narrowing the list of 
software-execute-O Data _Jtems to be searched by using a standard search 
The blank flag "O" was inserted by the RCS control unit engine. However, given the processing capabilities of mod- 
34 into the position of the adjective POS that was not ern computers, a direct search of all Data_Items stored on 
present in the sentence. The RCS control unit 34 then stores 20 ^ c rcmote data stora S e svstem 32 ^ the Principles of the 
the extracted segments in the Search_Profile. £f» nt . invention may be lengthy and therefore impractical 
For increased accuracy in searching, particularly in the ^ lt ™* b / Parable to utilize a standard search 
l *i_ o t_ * • ii .• i . engine, such as for example, AltaVista, to first return a small 

case where the Search String is very small, optional steps T & ,. ■ y c ~ „ , ¥ * AJ j en* i* *u ♦ 

• j a « - j, lL ; r ± , . 4 ~a ler list of Z Data Item Addresses of Data Items that 

418 and 419 may be performed by the RCS control unit 34. „ r _ . , t . c ~ r . ~^ trinn „,;a*a u„ *u a , lcor , t< i, 0 

. .1 • ^ j i 25 directly match the Search_Stnng provided by the user at the 

At a step 418, the RCS control unit 34 determines synonyms s(e ^ 

for each word in each segment extracted at the step 416. For *L * . f . . 

. - t , * „ 4 . - r.^c The advantageous approach of the present invention, 

example, for the segment computer-works-fast " the RCS . . « . . A ^ , A <io ^ u r a * *u 

, i *a j 7 e 11 c illustrated in steps 432 to 438, may then be applied to the 

control unit 34 deteranr.es the following synonyms for smallef D ^ ta _j tems to sele J and & ™ {0 the user 

' computer- : PC, calculator, mainframe, CPU processor; the 3Q ^ kular ^ ^ ^ ^ on] ^ d , o 

following synonyms for work : operate taction, labor, ^ user>s esse J search ^ bllt lhat also match the 

accomplish; and the following synonyms for fast : quick, , t . ^ , .J 1 , - < , 

. r • j * « j ' user s Imguistic patterns and that therefore correspond to the 

speedy, rapid, swift, prompt. Automatic determination of , S i j ** 1 j u i • i «i 

• 111 t it_ . j . . i user s cultural, educational, and psychological profile, 

synonyms is well known m the art, and ,s implemented as a an aJ R CS control unit 34 retrieves 

thesaurus function in most word processing software pro- J5 ^ Data ^ Itera _ Pro J fUes contending l0 the Z Data_ 

grams ' * A-yn *u r>^c * 1 % Item__Addresses retrieved at the step 428. This is readily 

At a step 420 the RCS control unit 34 composes a ^ ^ RC§ ^ 34 because eac > 

pluraluyofalernatesegmentsforeachsegmentstoredm D ataJtem_Profile contains the Data_Item_Address of 

Search Profile utilizing different combinations of syn- ^. . T . . , . . ~ , u 

" , . j ii7 i 10 r> i r iL the corresponding Data__Item from which the Data_ltem_ 

onyms determined at the step 418. For example, for the 4Q proflle w £ cons 7 mcted 

segment "computer-works-fast", the RCS control unit 34 A4 * ™ . t 

. i . .i_ c n • 1, 4 . At a step 432, the RCS control unit 34 compares the 

composes at least the following alternate segments: &egments ^ Search _ Profik with tQe f n eacn of 

PC-operate-quick the re tri e ved Data_Item ^Profiles to determine and identify 

PC-function-rapid the matches between the segments in the Search_Profile and 

CPU-operate -swift 45 the segments in each of the Data_Item_„Profiles. The RCS 

etc. control unit 34 then retrieves the DIP__SC for each segment 

The RCS control unit 34 then stores the alternate seg- in each of the Data_Item_Pro files that matches a corre- 

ments in the Search_Profile and returns to the test 408. sponding segment in the Searcher_Profile. 

Referring now to FIG. 8, at a step 424, the RCS control At a step 434, the RCS control unit 34 determines a 

unit 34 retrieves the User_Profile of the user initiating the 50 MATCH_VALUE for each segment in each Data__Item_ 

search at the step 400 from one of the local profile database Profile that also appears in the User^rofile and in the 

and the central profile database. At a step 426, the RCS Searchjrofile, by adding the UP_SC of that segment to 

control unit 34 compares the segments stored in the User__ the DIP_SC of that segment For example, if a particular 

Profile with the segments stored in the Searchjrofile to segment appears 23 times in the User_Profile (UP_SC-23) 

determine a number of matches between various segments in 55 and appears 5 times in the Data_Jtem_Profile (DIP_SC= 

each of the profiles and then retrieves the UP__SC for each 5), then the match value of this segment in the Data_Item_ 

of the matched segments from the User_Profile. For Profile is 23+5 or 28. 

example, if the User__Profile contains the following At a step 436, the RCS control unit 34 determines a 

segments, along with the UP_SCs in parentheses: FINAL_VALUE for each Data_Item_Profile by adding 

Joe- walk-beautiful (34) 60 me MATCH_VALUEs of all segments in the Data_Item_ 

Joe-walk-new (25), ' P J° m f ^ also a PP^ T » ^V*?'-* 0 ^ and ? the r 

j , SearctL^Profile. The FINAL_VALUE is representative of 

computer-execute-advanced (10), ^ degree to which {hc linguistic pattern of tne Data _ Item 

pohce-protect-watchful (8), matches the linguistic pattern of the user in light of the 

man-walk-happy (7), 6S fi Dg uisu'c pattern of the Search_String. 

computers-buy-expensive (3); At a step 438, the RCS control unit 34 retrieves the 

and the Search_Profile contained the following segments: Data Jtem ^Addresses corresponding to M Data_Item__ 



03/29/2004, EAST Version: 1.4.1 



US 6,199,1 

25 

Profiles with the highest FINAL__VALUEs and presents a 
list of the M Data_Item_Addresses to the user in order of 
descending magnitude of their corresponding FINAL__ 
VALUEs. The number M of the Data„Item__Addresses 
presented may be selected as a matter of design choice. For s 
example, M may be set to 10 or 20. At an optional step 440, 
the RCS control unit 34 automatically retrieves and opens, 
for the user, the Datajtem corresponding to the Data_ 
Item_Profile with the highest FINAL_VALUE. 

Thus, while there have shown and described and pointed 10 
out fundamental novel features of the invention as applied to 
preferred embodiments thereof, it will be understood that 
various omissions and substitutions and changes in the form 
and details of the devices and methods illustrated, and in 
their operation, may be made by those skilled in the art 15 
without departing from the spirit of the invention. For 
example, it is expressly intended that all combinations of 
those elements and/or method steps which perform substan- 
tially the same function in substantially the same way to 
achieve the same results are within the scope of the inven- 20 
tion. It is the intention, therefore, to be limited only as 
indicated by the scope of the claims appended hereto. 

I claim: 

1. A data processing method for enabling a user utilizing 
a local computer system having a local data storage system 25 
to locate desired data from a plurality of data items stored in 
a remote data storage system in a remote computer system, 
the remote computer system being linked to the local 
computer system by a telecommunication link, the method 
comprising the steps of: 30 

(a) extracting, by one of the local computer system and 
the remote computer system, a user profile from user 
linguistic data previously provided by the user, said 
user data profile being representative of a first linguistic 
pattern of the said user linguistic data; 35 

(b) constructing, by the remote computer system, a plu- 
rality of data item profiles, each plural data item profile 
corresponding to a different one of each plural data item 
stored in the remote data storage system, each of said 
plural data item profiles being representative of a 40 
second linguistic pattern of a corresponding plural data 
item, each said plural second linguistic pattern being 
substantially unique to each corresponding plural data 
item; 

45 

(c) providing, by the user to the local computer system, 
search request data representative of the user's 
expressed desire to locate data substantially pertaining 
to said search request data; 

(d) extracting, by one of the local computer system and 50 
the remote computer system, a search request profile 
from said search request data, said search request 
profile being representative of a third linguistic pattern 

of said search request data; 

(e) determining, by one of the local computer system and 55 
the remote computer system, a first similarity factor 
representative of a first correlation between said search 
request profile and said user profile by comparing said 
search request profile to said user profile; 

(f) determining, by one of the local computer system and 60 
the remote computer system, a plurality of second 
similarity factors, each said plural second similarity 
factor being representative of a second correlation 
between said search request profile and a different one 

of said plural data item profiles, by comparing said 65 
search request profile to each of said plural data item 
profiles; 
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(g) calculating, by one of the local computer system and 
the remote computer system, a final match factor for 
each of said plural data item profiles, by adding said 
first similarity factor to at least one of said plural 
second similarity factors in accordance with at least one 
intersection between said first correlation and said 
second correlation; 

(h) selecting, by one of the local computer system and the 
remote computer system, one of said plural data items 
corresponding to a plural data item profile having a 
highest final match factor; and 

(i) retrieving, by one of the local computer system and the 
remote computer system from the remote data storage 
system, said selected data item for display to the user, 
such that the user is presented with a data item having 
linguistic characteristics that substantially correspond 
to linguistic characteristics of the linguistic data gen- 
erated by the user, whereby the linguistic characteris- 
tics of the data item correspond to the user's social, 
cultural, educational, economic background as well as 
to the user's psychological profile. 

2. The method of claim 1, further comprising the step of: 
(j) prior to said step (a), automatically adding, by one of 

the local computer system and the remote computer 
system, textual data generated by the user during uti- 
lization of the local computer system to said user 
linguistic data. 

3. The method of claim 1, wherein said user linguistic data 
comprises at least one of: personal textual data generated by 
the user and favorite textual data generated by a source other 
than the user and that the user has adopted as being favorite. 

4. The method of claim 1, wherein said user linguistic data 
comprises at least one text item, each said at least one text 
item comprising at least one sentence. 

5. The method of claim 3, further comprising the step of: 
(k) prior to said step (a), selecting, by the user at least one 

of said personal textual data and said favorite textual 
data, from textual data stored in one of the local data 
storage system and the remote data storage system. 

6. The method of claim 1, further comprising the step of: 
(1) prior to said step (a), determining, by one of the local 

computer system and the remote computer system, 
whether an existing user data profile is stored in one of 
the local data storage system and the remote data 
storage system, and: 

1) when an existing user data profile is stored in one of 
the local data storage system and the remote data 
storage system, retrieving said existing user data 
profile and proceeding to said step (b); and 

2) when an existing user data profile is not stored in one 
of the local data storage system and the remote data 
storage system, proceeding to said step (a). 

7. The method of claim 4, wherein said step (a) comprises 
the steps of: 

(m) generating, by one of the local computer system and 

the remote computer system, a user data profile; 
(n) retrieving, by one of the local computer system and the 

remote computer system, a text item from said user 

linguistic data; 
(o) separating, by one of the local computer system and 

the remote computer system, said text item into at least 

one sentence; 

(p) extracting, from each of said at least one sentence, by 
one of the local computer system and the remote 
computer system, at least one segment representative of 
a linguistic pattern of each sentence of said at least one 
sentence; 
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(q) adding, by one of the local computer system and the 
remote computer system, at least one segment extracted 
at said step (p) to said user data profile; 

(r) repeating, by one of the local computer system and the 
remote computer system, said steps (n) to (q) for each 
text item of said at least one text item in said user 
linguistic data; 

(s) generating at least one user segment group, by one of 
the local computer system and the remote computer 
system, by grouping together identical segments of said 
at least one segment; 

(t) determining a user segment count, by one of the local 
computer system and the remote computer system, for 
each user segment group of said at least one user 
segment group, each said user segment count being 
representative of a number of identical segments in the 
corresponding user segment group of said at least one 
user segment group, and linking each said user segment 
count to the corresponding user segment group of said 
at least one user segment group; 

(u) sorting the user segment groups of said at least one 
user segment group, by one of the local computer 
system and the remote computer system, in an descend- 
ing order of user segment counts starting from a user 
segment group having a highest user segment count, 
and recording said user segment groups and corre- 
sponding user segment counts in said user data profile; 
and 

(v) storing, by one of the local computer system and the 
remote computer system, said user data profile, repre- 
sentative of said first linguistic pattern, in at least one 
of the local data storage system and the remote data 
storage system. 

8. The method of claim 7, wherein said step (o) comprises 
the step of: 

(w) determining a word count by sequentially counting 
words of said text item; 

(x) when an end of sentence mark is reached before said 
word count reaches a predefined word limit, storing 
said counted words as a sentence, restarting said word 
count, and repeating said step (w) starting after a last 
word of said stored sentence; and 

(y) when said word count reaches said predefined word 
limit, storing said counted words as a sentence, restart- 
ing said word count, and repeating said step (w) starting 
after a last word of said stored sentence. 

9. The method of claim 8, wherein said end of sentence 
mark comprises one of: a period, an exclamation mark, and 
a question mark. 

10. The method of claim 7, wherein said step (p) com- 
prises the steps, performed for each sentence of said at least 
one sentence, of: 

(z) identifying and tagging each word in a sentence as one 
of a predetermined plurality of different parts of 
speech; and 

(aa) arranging a predetermined number of said tagged 
words in a predetermined order of said predetermined 
plural different parts of speech to compose at least one 
segment for each possible combination of said prede- 
termined number of said tagged words arranged in said 
predetermined order, said at least one segment being 
representative of a linguistic pattern of said sentence. 

11. The method of claim 10, further comprising the step 

of: 

(bb) after said step (z), determining whether each word 
may serve as an additional part of speech, and when a 
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word may serve as an additional part of speech, adding 
an additional tag to said word to identify said word as 
said additional part of speech. 

12. The method of claim 10, wherein said predetermined 
plurality of different parts of speech comprises at least one 
of: noun, pronoun, verb, adverb, adjective, gerund, 
proposition, conjunction and interjection. 

13. The method of claim 10, wherein said predetermined 
plurality of different parts of speech comprises a noun, a 
verb and an adjective, wherein said predetermined number 
is three, and wherein said predetermined order is noun, verb, 
adjective. 

14. The method of claim 10, wherein said step (aa) further 
comprises the step of: 

(cc) when one of said predetermined plural different parts 
of speech is missing from said sentence, inserting a 
blank mark into said segment instead of said missing 
predetermined part of speech. 

15. The method of claim 7, wherein said step (v) further 
comprises the step of: 

(dd) encrypting said user data profile such that said 
encrypted user data profile may only be utilized when 
an authorization is received from the user. 

16. The method of claim 7, wherein said step (u) further 
comprises the step of: 

(ee) recording, in said user data profile, only a first 
predetermined portion of said at least one user segment 
groups having highest user segment counts. 

17. The method of claim 16, wherein said first predeter- 
mined portion comprises one of: 5,000 user segment groups 
and a top five percent of said at least one user segment 
groups. 

18. The method of claim 7, wherein said step (b) further 
comprises a step of: 

(3) for each plural data item, generating a data item record 
comprising at least one text item from the data item, 
each said at least one text item comprising at least one 
sentence. 

19. The method of claim 18, wherein one of said at least 
one text items is a primary text item, and wherein said 
primary text item comprises at least one hyperlink to at least 
one additional text item, such that when said at least one 
hyperlink is activated, said at least one additional text item 
is thereby retrieved, further comprising the step of: 

(gg) retrieving, by the remote computer system, said at 
least one additional text item into said data item record. 

20. The method of claim 18, wherein said step (b) 
comprises the steps, performed for each plural data item, of: 

(hh) generating, by the remote computer system, a data 
item profile, said data item profile comprising a data 
item address representative of a location of said data 
item in the remote data storage system, such that said 
data item may be retrieved by providing said data item 
address to said remote computer system; 

(ii) retrieving, by the remote computer system, a text item 
from said data item record; 

(jj) separating, by the remote computer system, said text 
item into at least one sentence; 

(kk) extracting, from each of said at least one sentence, by 
the remote computer system, at least one segment 
representative of a linguistic pattern of each sentence of 
said at least one sentence; 

(11) adding, by the remote computer system, at least one 
segment extracted at said step (kk) to said data item 
profile; 
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(mm) repeating, by the remote computer system, said 
steps (ii) to (11) for each text item of said at least one 
text item in said data item record; 

(nn) generating at least one data segment group, by the 
remote computer system, by grouping together identi- 
cal segments of said at least one segment; 

(oo) determining a data item segment count, by the remote 
computer system, for each data segment group of said 
at least one data segment group, each said data item 
segment count being representative of a number of 
identical segments in the corresponding data segment 
group of said at least one data segment group, and 
linking each said data item segment count to the 
corresponding data segment group of said at least one 
data segment group; 

(pp) sorting the data segment groups of said at least one 
data segment group, by the remote computer system, in 
an descending order of data item segment counts start- 
ing from a data segment group having a highest data 
item segment count, and recording said data segment 
groups and corresponding data item segment counts in 
said data item profile; and 

(qq) storing, by the remote computer system, said data 
item profile, representative of one of said plural second 
linguistic patterns, in the remote data storage system. 

21. The method of claim 20, wherein said step (jj) 
comprises the step of: 

(rr) determining a word count by sequentially counting 
words of said text item; 

(ss) when an end of sentence mark is reached before said 
word count reaches a predefined word limit, storing 
said counted words as a sentence, restarting said word 
count, and repeating said step (rr) starting after a last 
word of said stored sentence; and 

(tt) when said word count reaches said predefined word 
limit, storing said counted words as a sentence, restart- 
ing said word count, and repeating said step (rr) starting 
after a last word of said stored sentence. 

22. The method of claim 21, wherein said end of sentence 
mark comprises one of: a period, an exclamation mark, and 
a question mark. 

23. The method of claim 20, wherein said step (kk) 
comprises the steps, performed for each sentence of said at 
least one sentence, of: 

(uu) identifying and tagging each word in a sentence as 
one of said predetermined plurality of different parts of 
speech; and 

(vv) arranging a predetermined number of said tagged 
words in a predetermined order of said predetermined 
plural different parts of speech to compose at least one 
segment for each possible combination of said prede- 
termined number of said tagged words arranged in said 
predetermined order, said at least one segment being 
representative of a linguistic pattern of said sentence. 

24. The method of claim 23, further comprising the step 
of: 

(ww) after said step (uu), determining whether each word 
may serve as an additional part of speech, and when a 
word may serve as an additional part of speech, adding 
an additional tag to said word to identify said word as 
said additional part of speech, 

25. The method of claim 23, wherein said predetermined 
plurality of different parts of speech comprises at least one 
of: noun, pronoun, verb, adverb, adjective, gerund, 
proposition, conjunction and interjection. 
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26. The method of claim 23, wherein said predetermined 
plurality of different parts of speech comprises a noun, a 
verb and an adjective, wherein said predetermined number 
is three, and wherein said predetermined order is noun, verb, 
adjective. 

27. The method of claim 23, wherein said step (w) further 
comprises the step of: 

(xx) when one of said predetermined plural different parts 
of speech is missing from said sentence, inserting a 
blank mark into said segment instead of said missing 
predetermined part of speech. 

28. The method of claim 20, wherein said step (pp) further 
comprises the step of: 

(yy) recording, in said data item profile, only a second 
predetermined portion of said at least one data segment 
groups having highest data item segment counts. 

29. The method of claim 28, wherein said second prede- 
termined portion comprises one of: 5,000 data segment 
groups and a top five percent of said at least one data 
segment groups. 

30. The method of claim 20, wherein said step (d) 
comprises the steps of: 

(zz) generating, by one of the local computer system and 
the remote computer system, a search profile; 

(aaa) separating, by one of the local computer system and 
the remote computer system, said search request data 
into at least one sentence; 

(bbb) extracting, from each of said at least one sentence, 
by one of the local computer system and the remote 
computer system, at least one search segment repre- 
sentative of a linguistic pattern of each sentence of said 
at least one sentence; and 

(ccc) adding, by one of the local computer system and the 
remote computer system, at least one search segment 
extracted at said step (bbb) to said search profile, said 
search profile being representative of said third linguis- 
tic pattern of said search request data. 

31. The method of claim 30, wherein said step (aaa) 
comprises the step of: 

(ddd) determining a word count by sequentially counting 
words of said search request data; 

(eee) when an end of sentence mark is reached before said 
word count reaches a predefined word limit, storing 
said counted words as a sentence, restarting said word 
count, and repeating said step (ddd) starting after a last 
word of said stored sentence; and 

(iff) when said word count reaches said predefined word 
limit, storing said counted words as a sentence, restart- 
ing said word count, and repeating said step (ddd) 
starting after a last word of said stored sentence. 

32. The method of claim 31, wherein said end of sentence 
mark comprises one of: a period, an exclamation mark, and 
a question mark. 

33. The method of claim 30, wherein said step (bbb) 
comprises the steps, performed for each sentence of said at 
least one sentence, of: 

(ggg) identifying and tagging each word in a sentence as 
one of said predetermined plurality of different parts of 
speech; and 

(hhh) arranging a predetermined number of said tagged 
words in a predetermined order of said predetermined 
plural different parts of speech to compose at least one 
segment for each possible combination of said prede- 
termined number of said tagged words arranged in said 
predetermined order, said at least one segment being 
representative of a linguistic pattern of said sentence. 
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34. The method of claim 33, further comprising the step 
of: 

(iii) after said step (ggg), determining whether each word 
may serve as an additional part of speech, and when a 
word may serve as an additional part of speech, adding 
an additional tag to said word to identify said word as 
said additional part of speech. 

35. The method of claim 33, wherein said predetermined 
plurality of different parts of speech comprises at least one 
of: noun, pronoun, verb, adverb, adjective, gerund, 
proposition, conjunction and interjection. 

36. The method of claim 33, wherein said predetermined 
plurality of different parts of speech comprises a noun, a 
verb and an adjective, wherein said predetermined number 
is three, and wherein said predetermined order is noun, verb, 
adjective. 

37. The method of claim 33, wherein said step (hhh) 
further comprises the step of: 

(jjj) when one of said predetermined plural different parts 
of speech is missing from said sentence, inserting a 
blank mark into said segment instead of said missing 
predetermined part of speech. 

38. The method of claim 33, further comprising the steps 
of: 

(kkk) determining, by one of the local computer system 
and the remote computer system, at least one synonym 
for each word in each segment; 

(111) composing, by one of the local computer system and 
the remote computer system, a plurality of alternate 
search segments for each segment utilizing said 
synonyms, wherein said alternate search segments are 
composed in accordance with said predetermined order 
of said predetermined plural different parts of speech; 
and 

(mmm) recording, by one of the local computer system 
and the remote computer system, said plural alternate 
search segments in said search profile. 

39. The method of claim 30, wherein said step (e) 
comprises the steps of: 

(nnn) retrieving, by one of the local computer system and 
the remote computer system, said user data profile from 
one of the local data storage system and the remote data 
storage system; and 

(ooo) comparing, by one of the local computer system and 
the remote computer system, said at least one user 
segment group to said at least one search segment, and 
recording said user segment counts of each user seg- 
ment group of said at least one user segment group that 
matches a corresponding search segment of said at least 
one search segment, said user segment counts being 
representative of said first similarity factor. 

40. The method of claim 39, wherein said step (f) com- 
prises the steps of: 

(ppp) for each plural data item, retrieving, by one of the 
local computer system and the remote computer 
system, a corresponding data item profile from the 
remote data storage system; and 

(qqq) for each plural data item profile, comparing, by one 
of the local computer system and the remote computer 
system, said at least one data segment group to said at 
least one search segment, and recording said data 
segment counts of each data segment group of said at 
least one data segment group that matches a corre- 
sponding search segment of said at least one search 
segment, said data segment counts being representative 
of said plural second similarity factor. 
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41. The method of claim 40, wherein said step (g) 
comprises the steps of: 

(rrr) for each said plural data item profile, determining a 
least one match value, by one of the local computer 
system and the remote computer system, by first iden- 
tifying a data segment group in the plural data item 
profile that matches both a corresponding search seg- 
ment and a corresponding user segment group and then 
adding said user segment count of said corresponding 
user segment group to said data segment count of said 
identified data segment group, wherein when no 
matches are identified, said at least one match value is 
set to null; and 

(sss) for each said plural data item profile, determining a 
final match factor, by one of the local computer system 
and the remote computer system, by adding together all 
said at least one match values determined for said 
plural data item profile at said step (rrr). 

42. The method of claim 40, wherein said step (ppp) 
comprises the steps of: 

(ttt) applying, by the remote computer system, said search 
request data to a conventional data search engine, 
implemented in the remote computer system, to return 
a list of at least one data item address of at least one 
preliminary matching data item that potentially corre- 
sponds to said search request data; and 

(uuu) retrieving from the remote storage system, by one 
of the local computer system and the remote computer 
system, at least one data item profile corresponding to 
said at least one preliminary matching data item in said 
list. 

43. The method of claim 1, wherein said step (h) com- 
prises the steps of: 

(vw) selecting, by one of the local computer system and 
the remote computer system, a portion of said plural 
data items corresponding to a predetermined number of 
plural data item profiles having highest final match 
factors; and 
wherein said step (i) comprises the step of: 

(www) retrieving, by one of the local computer system 
and the remote computer system from the remote 
data storage system, said selected data items for 
display to the user, such that the user is presented 
with a group of data items having linguistic charac- 
teristics that substantially correspond to linguistic 
characteristics of the linguistic data generated by the 
user, whereby the linguistic characteristics of the 
data items correspond to the user's social, cultural, 
educational, economic background as well as to the 
user's psychological profile. 

44. A data processing method for enabling a user, utilizing 
a computer system having a data storage system, to locate 
desired data from a plurality of data items stored in the data 
storage system, the method comprising the steps of: 

(a) extracting, by the local computer system, a user profile 
from user linguistic data previously provided by the 
user, said user data profile being representative of a first 
linguistic pattern of the said user linguistic data; 

(b) constructing, by the computer system, a plurality of 
data item profiles, each plural data item profile corre- 
sponding to a different one of each plural data item 
stored in the data storage system, each of said plural 
data item profiles being representative of a second 
linguistic pattern of a corresponding plural data item, 
each said plural second linguistic pattern being sub- 
stantially unique to each corresponding plural data 
item; 
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(c) providing, by the user to the computer system, search 
request data representative of the user's expressed 
desire to locate data substantially pertaining to said 
search request data; 

(d) extracting, by the computer system, a search request 5 
profile from said search request data, said search 
request profile being representative of a third linguistic 
pattern of said search request data; 

(e) determining, by the computer system, a first similarity 
factor representative of a first correlation between said 10 
search request profile and said user profile by compar- 
ing said search request profile to said user profile; 

(f) deterrnining, by the computer system, a plurality of 
second similarity factors, each said plural second simi- 
larity factor being representative of a second correla- 15 
tion between said search request profile and a different 
one of said plural data item profiles, by comparing said 
search request profile to each of said plural data item 
profiles; 

(g) calculating, by the computer system, a final match 
factor for each of said plural data item profiles, by 
adding said first similarity factor to at least one of said 
plural second similarity factors in accordance with at 
least one intersection between said first correlation and 2$ 
said second correlation; 

(h) selecting, by the computer system, one of said plural 
data items corresponding to a plural data item profile 
having a highest final match factor; and 

(i) retrieving, by the computer system from the data 30 
storage system, said selected data item for display to 
the user, such that the user is presented with a data item 
having linguistic characteristics that substantially cor- 
respond to linguistic characteristics of the linguistic 
data generated by the user, whereby the linguistic 35 
characteristics of the data item correspond to the user's 
social, cultural, educational, economic background as 
well as to the user's psychological profile. 

45. A data processing method for generating a user data 
profile representative of a user's social, cultural, 40 
educational, economic background and of the user's psy- 
chological profile, the method being implemented in a 
computer system having a storage system, comprising the 
steps of: 

(a) retrieving, by the computer system, user linguistic data 45 
previously provided by the user, said user linguistic 
data comprising at least one text item, each said at least 
one text item comprising at least one sentence; 

(b) generating, by the computer system, an empty user 
data profile; 5 ° 

(c) retrieving, by the computer system, a text item from 
said user linguistic data; 

(d) separating, by the computer system, said text item into 

at least one sentence; ^ 

(e) extracting, from each of said at least one sentence, by 
the computer system, at least one segment representa- 
tive of a linguistic pattern of each sentence of said at 
least one sentence; 

(r) adding, by the computer system, at least one segment 60 
extracted at said step (e) to said user data profile; 

(g) repeating, by the computer system, said steps (c) to (f) 
for each text item of said at least one text item in said 
user linguistic data; 

(h) generating at least one user segment group, by the 65 
computer system, by grouping together identical seg- 
ments of said at least one segment; 
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(i) determining a user segment count, by the computer 
system, for each user segment group of said at least one 
user segment group, each said user segment count 
being representative of a number of identical segments 
in the corresponding user segment group of said at least 
one user segment group, and linking each said user 
segment count to the corresponding user segment group 
of said at least one user segment group; 

(j) sorting the user segment groups of said at least one user 
segment group, by the computer system, in an descend- 
ing order of user segment counts starting from a user 
segment group having a highest user segment count, 
and recording said user segment groups and corre- 
sponding user segment counts in said user data profile; 
and 

(k) storing, by the computer system, said user data profile, 
representative of an overall linguistic pattern of the 
user, in the data storage system, said overall linguistic 
pattern substantially corresponding to the user's social, 
cultural, educational, economic background and to the 
user's psychological profile. 

46. The method of claim 45, further comprising the step 
of: 

(1) prior to said step (a), automatically adding, by the 
computer system, textual data generated by the user 
during utilization of the computer system to said user 
linguistic data. 

47. The method of claim 45, wherein said user linguistic 
data comprises at least one of: personal textual data gener- 
ated by the user and favorite textual data generated by a 
source other than the user and that the user has adopted as 
being favorite. 

48. The method of claim 47, further comprising the step 

of: 

(m) prior to said step (a), selecting, by the user at least one 
of said personal textual data and said favorite textual 
data, from textual data stored in the data storage 
system. 

49. The method of claim 45, wherein said step (d) 
comprises the step of: 

(n) determining a word count by sequentially counting 
words of said text item; 

(0) when an end of sentence mark is reached before said 
word count reaches a predefined word limit, storing 
said counted words as a sentence, restarting said word 
count, and repeating said step (n) starting after a last 
word of said stored sentence; and 

(p) when said word count reaches said predefined word 
limit, storing said counted words as a sentence, restart- 
ing said word count, and repeating said step (n) starting 
after a last word of said stored sentence. 

50. The method of claim 49, wherein said end of sentence 
mark comprises one of: a period, an exclamation mark, and 
a question mark. 

51. The method of claim 45, wherein said step (e) 
comprises the steps, performed for each sentence of said at 
least one sentence, of: 

(q) identifying and tagging each word in a sentence as one 
of a predetermined plurality of different parts of 
speech; and 

(r) arranging a predetermined number of said tagged 
words in a predetermined order of said predetermined 
plural different parts of speech to compose at least one 
segment for each possible combination of said prede- 
termined number of said tagged words arranged in said 
predetermined order, said at least one segment being 
representative of a linguistic pattern of said sentence. 
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52. The method of claim 51, further comprising the step 

of: 

(s) after said step (q), determining whether each word may 
serve as an additional part of speech, and when a word 
may serve as an additional part of speech, adding an s 
additional tag to said word to identify said word as said 
additional part of speech. 

53. The method of claim 51, wherein said predetermined 
plurality of different parts of speech comprises at least one 
of: noun, pronoun, verb, adverb, adjective, gerund, 1Q 
proposition, conjunction and interjection. 

54. The method of claim 51, wherein said predetermined 
plurality of different parts of speech comprises a noun, a 
verb and an adjective, wherein said predetermined number 
is three, and wherein said predetermined order is noun, verb, 
adjective. 15 

55. The method of claim 51, wherein said step (r) further 
comprises the step of: 

(t) when one of said predetermined plural different parts 
of speech is missing from said sentence, inserting a 
blank mark into said segment instead of said missing 20 
predetermined part of speech. 

56. The method of claim 45, wherein said step (k) further 
comprises the step of: 

(u) encrypting said user data profile such that said ^ 
encrypted user data profile may only be utilized when 
an authorization is received from the user. 

57. The method of claim 45, wherein said step (j) further 
comprises the step of: 

(v) recording, in said user data profile, only a first pre- 3Q 
determined portion of said at least one user segment 
groups having highest user segment counts. 

58. The method of claim 57, wherein said first predeter- 
mined portion comprises one of: 5,000 user segment groups 
and a top five percent of said at least one user segment 35 
groups. 

59. A data processing system, comprising a local com- 
puter system having a local data storage system, and a 
remote computer system having a remote data storage, the 
remote computer system being linked to the local computer 4Q 
system by a telecommunication link, for enabling a user of 
the local computer system to locate desired data from a 
plurality of data items stored in the remote data storage 
system, the data processing system comprising: 

first extracting means, in one of the local computer system 45 
and the remote computer system, for extracting a user 
profile from user linguistic data previously provided by 
the user, said user data profile being representative of a 
first linguistic pattern of the said user linguistic data; 

first control means, in one of the local computer system 50 
and the remote computer system, for constructing a 
plurality of data item profiles, each plural data item 
profile corresponding to a different one of each plural 
data item stored in the remote data storage system, each 
of said plural data item profiles being representative of 55 
a second linguistic pattern of a corresponding plural 
data item, each said plural second linguistic pattern 
being substantially unique to each corresponding plural 
data item; 

first input means, in the local computer system, for 60 
acquiring search request data from the user, said search 
request data being representative of the user's 
expressed desire to locate data in the remote storage 
system substantially pertaining to said search request 
data; 65 

second extracting means, in one of the local computer 
system and the remote computer system, connected to 
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said first input means, for extracting a search request 
profile from said acquired search request data, said 
search request profile being representative of a third 
linguistic pattern of said search request data; 
second control means, in one of the local computer system 
and the remote computer system, connected to said first 
extracting means and said second extracting means, for 
determining a first similarity factor representative of a 
first correlation between said search request profile and 
said user profile by comparing said search request 
profile to said user profile; 
third control means, in one of the local computer system 
and the remote computer system, connected to said first 
control means and said second extracting means, for 
determining a plurality of second similarity factors, 
each said plural second similarity factor being repre- 
sentative of a second correlation between said search 
request profile and a different one of said plural data 
item profiles, by comparing said search request profile 
to each of said plural data item profiles; 
fourth control means, in one of the local computer system 
and the remote computer system, connected to said 
second an said third control means, for calculating a 
final match factor for each of said plural data item 
profiles, by adding said first similarity factor to at least 
one of said plural second similarity factors in accor- 
dance with at least one intersection between said first 
correlation and said second correlation; 
first selection means, in one of the local computer system 
and the remote computer system, connected to said 
fourth control means, for selecting one of said plural 
data items corresponding to a plural data item profile 
having a highest final match factor; and 
first retrieving means, in one of the local computer system 
and the remote computer system, connected to said first 
selection means, for retrieving, from the remote data 
storage system, said selected data item for display to 
the user, such that the user is presented with a data item 
having linguistic characteristics that substantially cor- 
respond to linguistic characteristics of the linguistic 
data generated by the user, whereby the linguistic 
characteristics of the data item correspond to the user's 
social, cultural, educational, economic background as 
well as to the user's psychological profile. 
60. A data processing system, comprising a computer 
system having a data storage system, for enabling a user of 
the computer system to locate desired data from a plurality 
of data items stored in the data storage system, the data 
processing system comprising: 

first extracting means for extracting a user profile from 
user linguistic data previously provided by the user, 
said user data profile being representative of a first 
linguistic pattern of the said user linguistic data; 
first control means for constructing a plurality of data item 
profiles, each plural data item profile corresponding to 
a different one of each plural data item stored in the 
data storage system, each of said plural data item 
profiles being representative of a second linguistic 
pattern of a corresponding plural data item, each said 
plural second linguistic pattern being substantially 
unique to each corresponding plural data item; 
first input means for acquiring search request data from 
the user, said search request data being representative 
of the user's expressed desire to locate data in the 
storage system substantially pertaining to said search 
request data; 



03/29/2004, EAST Version: 1.4.1 



US 6,199,067 Bl 



37 



38 



second extracting means, connected to said first input 
means, for extracting a search request profile from said 
acquired search request data, said search request profile 
being representative of a third linguistic pattern of said 
search request data; s 

second control means, connected to said first extracting 
means and said second extracting means, for determin- 
ing a first similarity factor representative of a first 
correlation between said search request profile and said 
user profile by comparing said search request profile to 10 
said user profile; 

third control means, connected to said first control means 
and said second extracting means, for determining a 
plurality of second similarity factors, each said plural 
second similarity factor being representative of a sec- 15 
ond correlation between said search request profile and 
a different one of said plural data item profiles, by 
comparing said search request profile to each of said 
plural data item profiles; 

fourth control means, connected to said second an said 
third control means, for calculating a final match factor 
for each of said plural data item profiles, by adding said 
first similarity factor to at least one of said plural 
second similarity factors in accordance with at least one 



intersection between said first correlation and said 
second correlation; 

first selection means, connected to said fourth control 
means, for selecting one of said plural data items 
corresponding to a plural data item profile having a 
highest final match factor; and 

first retrieving means, connected to said first selection 
means, for retrieving, from the data storage system, 
said selected data item for display to the user, such that 
the user is presented with a data item having linguistic 
characteristics that substantially correspond to linguis- 
tic characteristics of the linguistic data generated by the 
user, whereby the linguistic characteristics of the data 
item correspond to the user's social, cultural, 
educational, economic background as well as to the 
user's psychological profile. 

61. The method of claim 1, wherein the remote computer 
system comprises a plurality of computer systems connected 
to the Internet and the World Wide Web. 

62. The system of claim 59, wherein the remote computer 
system comprises a plurality of computer systems connected 
to the Internet and the World Wide Web. 
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